Joyqul's note for computer science: [IOS] Ch4 Multithreaded Programming

A thread is a basic unit of CPU utilization

Comprises

a thread ID
a program counter
a register set
a stack.

Shares with other threads belonging to the same process

code section
data section
other operating-system resources(such as open files and signals)

A traditional (or heavyweight) process has a single thread of control.

If a process has multiple threads of control, it can perform more than one task at a time.

multiprocess 的 overhead 比 multithreaded 大

Motivation

One solution is to have the server run as a single process that accepts
requests. When the server receives a request, it creates a separate process
to service that request. In fact, this process-creation method was in common
use before threads became popular. Process creation is time consuming and
resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the Web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server will create a new thread to service the request and resume listening for additional requests.

Benefits

Responsiveness

Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performinga lengthy operation

Increasing responsiveness to the user.
A multithreaded Web browser could allow user interaction in one thread while an image was being loaded in another thread.

Resource sharing

Processes may only share resources through techniques such as shared memory or message passing.

Such techniques must be explicitly arranged by the programmer.

Threads share the memory and the resources of the process to which they belong by default.

sharing code and data allows an application to have several different threads of activity within the same address space.

Economy

Allocating memory and resources for process creation is costly.
Threads share the resources of the process to which they belong is more economical to create and context-switch threads.
In general it is much more time consuming to create and manage processes than threads.
In Solarisf for example, creating a process is about thirty times slower than is creating a thread, and context switching is about five times slower.

Scalability

The benefits of multithreading can be greatly increased in a multiprocessor architecture, where threads may be running in parallel on different processors.
Multithreading on a multi-CPU machine increases parallelism

Multicore Programming

Multicore systems putting pressure on programmers, challenges include:

Dividing activities

Involves examining applications to find areas that can be divided into separate, concurrent tasks and thus can run in parallel on individual cores.

Balance

While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value.
In some instances, a certain task may not contribute as much value to the overall process as other tasks; using a separate execution core to run that task may not be worth the cost.

Data splitting

Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores.

Data dependency

The data accessed by the tasks must be examined for dependencies between two or more tasks.
If one task depends on data from another, programmers must ensure that the execution of the tasks is synchronized to accommodate the data dependency.

Testing and debugging

When a program is running in parallel on multiple cores, there are many different execution paths.
more difficult than testing and debugging single-threaded applications.

每個顏色一個 Kernel theard，白色代表沒有用到

(A) context switch's overhead 很大，切換單純用軟體來做，但是浪費的還是很多

(B) 還是只有一個 core ，但是 thread 切換是用硬體來做 => 少掉 switch 的成本

(D) Hyperthearding ，每個 stage 會有多個顏色混雜

Multithreading Model

Many-to-one model

Many user-level threads mapped to single kernel thread

In this model, the library maps all threads to a single lightweight process

Advantages:

totally portable
easy to do with few systems dependencies

Disadvantages:

cannot take advantage of parallelism (because one thread can access the kernel at one time)
may have to block for synchronous I/O （因為對應到只有一個 kernel thread）
there is a clever technique for avoiding it

Mainly used in language systems, portable libraries

Green thread - a thread library available for Solaris, so as does GNU portable threads

One-to-one model

Each user-level thread maps to kernel thread

In this model, the library maps each thread to a different lightweight process

Advantages:

can exploit parallelism, blocking system calls

Disadvantages:

thread creation involves LWP creation (kernel thread creating)
each thread takes up kernel resources
limiting the number of total threads

Used in LinuxThreads and other systems where LWP creation is not too expensive

Many-to-many model

Allows many user level threads to be mapped to many kernel threads

Allows the operating system to create a sufficient number of kernel threads

Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution.

Two level

In this model, the library has two kinds of threads:

bound:

mapped each to a single lightweight process

unbound

mapped to the same LWP

Probably the best of both worlds

Used in the Solaris implementation of Pthreads (and several other Unix implementations)

Thread Libreries

Thread library provides programmer with API for creating and managing threads

Two primary ways of implementing

Library entirely in user space
Kernel-level library supported by the OS

Pthreads

May be provided either as user-level or kernel-level

A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization

API specifies behavior of the thread library, implementation is up to development of the library

Common in UNIX operating systems (Solaris, Linux, Mac OS X)

Java threads

Java threads are managed by the JVM

Typically implemented using the threads model provided by underlying OS

Java threads may be created by:

Extending Thread class
Implementing the Runnable interface

Threading issues

The fork() and exec() system calls

fork() system call is used to create a separate, duplicate process.

The semantics of the fork() and exec() system calls change in a multithreaded program.

If one thread in a program calls fork(), does the new process duplicate all threads, or is the new process single-threaded?
Some UNIX systems have chosen to have two versions of fork()

one that duplicates all threads
another that duplicates only the thread that invoked the fork() system call.

The exec() system call typically works in the same way

If a thread invokes the exec() system call, the program specified in the parameter to exec () will replace the entire process-including all threads.

Which of the two versions of fork() to use depends on the application.

If exec() is called immediately after forking, then duplicating all threads is unnecessary, as the program specified in the parameters to exec() will replace the process.

In this instance, duplicating only the calling thread is appropriate.

If the separate process does not call exec () after forking, the separate process should duplicate all threads.

Cancellation

A thread that is to be canceled is often referred to as the target thread.

Cancellation of a target thread may occur in two different scenarios:

Asynchronous cancellation

One thread immediately terminates the
target thread.

Deferred cancellation

The target thread periodically checks whether it should terminate, allowing it an opportunity to terminate itself in an
orderly fashion.

The difficulty with cancellation

situations where resources have been allocated to a canceled thread
where a thread is canceled while in the midst of updating data it is sharing with other threads.
This becomes especially troublesome with asynchronous cancellation.
Often, the operating system will reclaim system resources from a canceled thread but will not reclaim all resources.

Canceling a thread asynchronously may not free a necessary system-wide resource.

With deferred cancellation

one thread indicates that a target thread is to be canceled, but cancellation occurs only after the target thread has checked a flag to determine whether or not it should be canceled.
The thread can perform this check at a point at which it can be canceled safely.

Pthreads refers to such points as cancellation point

Signal handling

Signals are used in UNIX systems to notify a process that a particular event has occurred

A signal handler is used to process signals

Signal is generated by particular event
Signal is delivered to a process
Signal is handled

Options: (In multithreads)

Deliver the signal to the thread to which the signal applies
Deliver the signal to every thread in the process
Deliver the signal to certain threads in the process
Assign a specific thread to receive all signals for the process

Examples of synchronous signals

illegal memory access
division by 0
If a running program performs either of these actions, a signal is generated.
Synchronous signals are delivered to the same process that performed the operation that caused the signal (that is the reason they are considered synchronous).

When a signal is generated by an event external to a running process, that process receives the signal asynchronously.

Examples of such signals include terminating a process with specific keystrokes (such as <control> <C>) and having a timer expire.
Typically, an asynchronous signal is sent to another process.

Thread Pools

Create a number of threads in a pool where they await work

Advantages:

Usually slightly faster to service a request with an existing thread than create a new thread
Allows the number of threads in the application(s) to be bound to the size of the pool

Issue:

Concerns the amount of time required to create the thread prior to servicing the request, together with the fact that this thread will be discarded once it has completed its work.
If we allow all concurrent requests to be serviced in a new thread, we have not placed a bound on the number of threads concurrently active in the system.

Unlimited threads could exhaust system resources, such as CPU time or memory. One solution to this problem is to use a thread pool

General idea behind a thread pool

Create a number of threads at process startup and place them into a pool, where they sit and wait for work.
When a server receives a request, it awakens a thread from this pool - if one is available - and passes it the request for service.
Once the thread completes its service, it returns to the pool and awaits more work.
If the pool contains no available thread, the server waits until one becomes free.

Thread Specific Data

Threads belonging to a process share the data of the process.

In some circumstances, each thread need its own copy of certain data.

We will call such data thread specific data
In a transaction-processing system, we might service each transaction in a separate thread.
each transaction might be assigned a unique identifier.
To associate each thread with its unique identifier, we could use thread-specific data.
Most thread libraries-including Win32 and Pthreads-provide some form of support for thread-specific data. Java provides support as well.

Scheduler Activations

Concerns communication between the kernel and the thread library

may be required by the many-to-many and two-level models
allows the number of kernel threads to be dynamically adjusted to help ensure the best performance.
Many systems implementing either the many-to-many or the two-level model place an intermediate data structure between the user and kernel threads.

This data structure-typically known as a lightweight process, or LWP
To the user-thread library, the LWP appears to be a virtual processor on which the application can schedule a user thread to run.
Each LWP is attached to a kernel thread, and it is kernel threads that the operating system schedules to run on physical processors.
If a kernel thread blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well.
Up the chain, the user-level thread attached to the LWP also blocks.

An application may require any number of LWPs to run efficiently.

Consider a CPU-bound application running on a single processor.

Only one thread can run at once, so one LWP is sufficient.
An application that is I/O-intensive may require multiple LWPs to execute.

Typically, an LWP is required for each concurrent blocking system call.
For example, if five different file-read requests occur simultaneously. Five LWPs are needed, because all could be waiting for I/O completion in the kernel.

If a process has only four LWPs, then the fifth request must wait for one of the LWPs to return from the kernel.

One scheme for communication between the user-thread library and the kernel is known as scheduler activations

The kernel provides an application with a set of virtual processors (LWPs), and the application can schedule user threads onto an available virtual processor.
Furthermore, the kernel must inform an application about certain events.
This procedure is known as an upcalls are handled by the thread library with an upcall handler and upcall handlers must run on a virtual processor.
One event that triggers an upcall occurs when an application thread is about to block.
In this scenario, the kernel makes an upcall to the application informing it that a thread is about to block and identifying the specific thread.
The kernel then allocates a new virtual processor to the application.

The application runs an upcall handler on this new virtual processor, which saves the state of the blocking thread and relinquishes(放棄) the virtual processor on which the blocking thread is running.
The upcall handler then schedules another thread that is eligible to run on the new virtual processor.
When the event that the blocking thread was waiting for occurs, the kernel makes another upcall to the thread library informing it that the previously blocked thread is now eligible to run.

The up call handler for this event also requires a virtual processor, and the kernel may allocate a new virtual processor or preempt(強佔) one of the user threads and run the upcall handler on its virtual processor.
After marking the unblocked thread as eligible to run, the application schedules an eligible thread to run on an available virtual processor.

Windows XP threads

Implements the one-to-one mapping, kernel-level

Each thread contains

A thread id
Register set
Separate user and kernel stacks
Private data storage area

The register set, stacks, and private storage area are known as the context of the threads

The primary data structures of a thread include:

ETHREAD (executive thread block)
KTHREAD (kernel thread block)
TEB (thread environment block)

Linux threads

Linux refers to them as tasks rather than threads

Thread creation is done through clone() system call

clone() allows a child task to share the address space of the parent task (process)

Thread Design Patterns

Common ways of structuring programs using threads

Boss/workers model

boss gets assignments, dispatches tasks to workers
variants (thread pool, single thread per connection...)
Advantage: simplicity
Disadvantage:

bound on number of workers
overheard of threads creation,
contention if requests have inter-dependencies

Variants:

fixed thread pool (aka workpile, workqueue), producer/consumer relationship, workers determine what needs to be performed...

Pipeline model

do some work, pass partial result to next thread
Each thread completes portion of a task, and passes results
like an assembly line or a processor pipeline
Advantages: trivial synchronization, simplicity
Disadvantages:

limits degree of parallelism, throughput driven by slowest
stage, hand-tuning needed

Up-calls

fast control flow transfer for layered systems
Layered applications, e.g. network protocol stacks have top-
down and bottom-up flows
Up-calls is a technique in which you structure layers so that
they can expect calls from below
Thread pool of specialized threads in each layer

essentially an up-call pipeline per connection

Advantages:

best when used with fast, synchronous control
flow transfer mechanisms or program structuring tool

Disadvantages:

programming becomes more complicated, synchronization required for top-down

Version stamps

technique for keeping information consistent
(Not a programming structure idea but useful technique for any kind of distributed environment)
Maintain “version number” for shared data

keep local cached copy of data
check versions to determine if changed

Reference:
Operating System Concepts 8th, by Silberschatz, Galvin, Gagne
Wikipedia

Joyqul's note for computer science

2013年11月8日星期五