- A thread is a basic unit of CPU utilization
- Comprises
- a thread ID
- a program counter
- a register set
- a stack.
- Shares with other threads belonging to the same process
- code section
- data section
- other operating-system resources(such as open files and signals)
- A traditional (or heavyweight) process has a single thread of control.
- If a process has multiple threads of control, it can perform more than one task at a time.
- multiprocess 的 overhead 比 multithreaded 大
Motivation
One solution is to have the server run as a single process that accepts
requests. When the server receives a request, it creates a separate process
to service that request. In fact, this process-creation method was in common
use before threads became popular. Process creation is time consuming and
resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the Web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server will create a new thread to service the request and resume listening for additional requests.
Benefits
- Responsiveness
- Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performinga lengthy operation
- Increasing responsiveness to the user.
- A multithreaded Web browser could allow user interaction in one thread while an image was being loaded in another thread.
- Resource sharing
- Processes may only share resources through techniques such as shared memory or message passing.
- Such techniques must be explicitly arranged by the programmer.
- Threads share the memory and the resources of the process to which they belong by default.
- sharing code and data allows an application to have several different threads of activity within the same address space.
- Economy
- Allocating memory and resources for process creation is costly.
- Threads share the resources of the process to which they belong is more economical to create and context-switch threads.
- In general it is much more time consuming to create and manage processes than threads.
- In Solarisf for example, creating a process is about thirty times slower than is creating a thread, and context switching is about five times slower.
- Scalability
- The benefits of multithreading can be greatly increased in a multiprocessor architecture, where threads may be running in parallel on different processors.
- Multithreading on a multi-CPU machine increases parallelism
Multicore Programming
Multicore systems putting pressure on programmers, challenges include:
- Dividing activities
- Involves examining applications to find areas that can be divided into separate, concurrent tasks and thus can run in parallel on individual cores.
- Balance
- While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value.
- In some instances, a certain task may not contribute as much value to the overall process as other tasks; using a separate execution core to run that task may not be worth the cost.
- Data splitting
- Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores.
- Data dependency
- The data accessed by the tasks must be examined for dependencies between two or more tasks.
- If one task depends on data from another, programmers must ensure that the execution of the tasks is synchronized to accommodate the data dependency.
- Testing and debugging
- When a program is running in parallel on multiple cores, there are many different execution paths.
- more difficult than testing and debugging single-threaded applications.
每個顏色一個 Kernel theard,白色代表沒有用到
|
(A) context switch's overhead 很大,切換單純用軟體來做,但是浪費的還是很多
(B) 還是只有一個 core ,但是 thread 切換是用硬體來做 => 少掉 switch 的成本
(C) 多加一些暫存器,讓他不要每次都重新從 main memory load (還是只有一個 core & 一套 ALU)
(D) Hyperthearding ,每個 stage 會有多個顏色混雜
Multithreading Model
Many-to-one modelMany user-level threads mapped to single kernel thread
- In this model, the library maps all threads to a single lightweight process
- Advantages:
- totally portable
- easy to do with few systems dependencies
- Disadvantages:
- cannot take advantage of parallelism (because one thread can access the kernel at one time)
- may have to block for synchronous I/O (因為對應到只有一個 kernel thread)
- there is a clever technique for avoiding it
- Mainly used in language systems, portable libraries
- Green thread - a thread library available for Solaris, so as does GNU portable threads
One-to-one model
Each user-level thread maps to kernel thread
- In this model, the library maps each thread to a different lightweight process
- Advantages:
- can exploit parallelism, blocking system calls
- Disadvantages:
- thread creation involves LWP creation (kernel thread creating)
- each thread takes up kernel resources
- limiting the number of total threads
- Used in LinuxThreads and other systems where LWP creation is not too expensive
Many-to-many model
- Allows many user level threads to be mapped to many kernel threads
- Allows the operating system to create a sufficient number of kernel threads
- Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution.
Two level
- In this model, the library has two kinds of threads:
- bound:
- mapped each to a single lightweight process
- unbound
- mapped to the same LWP
- Probably the best of both worlds
- Used in the Solaris implementation of Pthreads (and several other Unix implementations)
Thread Libreries
- Thread library provides programmer with API for creating and managing threads
- Two primary ways of implementing
- Library entirely in user space
- Kernel-level library supported by the OS
- May be provided either as user-level or kernel-level
- A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization
- API specifies behavior of the thread library, implementation is up to development of the library
- Common in UNIX operating systems (Solaris, Linux, Mac OS X)
Java threads
- Java threads are managed by the JVM
- Typically implemented using the threads model provided by underlying OS
- Java threads may be created by:
- Extending Thread class
- Implementing the Runnable interface
Threading issues
The fork() and exec() system calls- fork() system call is used to create a separate, duplicate process.
- The semantics of the fork() and exec() system calls change in a multithreaded program.
- If one thread in a program calls fork(), does the new process duplicate all threads, or is the new process single-threaded?
- Some UNIX systems have chosen to have two versions of fork()
- one that duplicates all threads
- another that duplicates only the thread that invoked the fork() system call.
- The exec() system call typically works in the same way
- If a thread invokes the exec() system call, the program specified in the parameter to exec () will replace the entire process-including all threads.
- Which of the two versions of fork() to use depends on the application.
- If exec() is called immediately after forking, then duplicating all threads is unnecessary, as the program specified in the parameters to exec() will replace the process.
- In this instance, duplicating only the calling thread is appropriate.
- If the separate process does not call exec () after forking, the separate process should duplicate all threads.
Cancellation
- A thread that is to be canceled is often referred to as the target thread.
- Cancellation of a target thread may occur in two different scenarios:
- Asynchronous cancellation
- One thread immediately terminates the
target thread. - Deferred cancellation
- The target thread periodically checks whether it should terminate, allowing it an opportunity to terminate itself in an
orderly fashion.
- The difficulty with cancellation
- situations where resources have been allocated to a canceled thread
- where a thread is canceled while in the midst of updating data it is sharing with other threads.
- This becomes especially troublesome with asynchronous cancellation.
- Often, the operating system will reclaim system resources from a canceled thread but will not reclaim all resources.
- Canceling a thread asynchronously may not free a necessary system-wide resource.
- With deferred cancellation
- one thread indicates that a target thread is to be canceled, but cancellation occurs only after the target thread has checked a flag to determine whether or not it should be canceled.
- The thread can perform this check at a point at which it can be canceled safely.
- Pthreads refers to such points as cancellation point
Signal handling
- Signals are used in UNIX systems to notify a process that a particular event has occurred
- A signal handler is used to process signals
- Signal is generated by particular event
- Signal is delivered to a process
- Signal is handled
- Options: (In multithreads)
- Deliver the signal to the thread to which the signal applies
- Deliver the signal to every thread in the process
- Deliver the signal to certain threads in the process
- Assign a specific thread to receive all signals for the process
- Examples of synchronous signals
- illegal memory access
- division by 0
- If a running program performs either of these actions, a signal is generated.
- Synchronous signals are delivered to the same process that performed the operation that caused the signal (that is the reason they are considered synchronous).
- When a signal is generated by an event external to a running process, that process receives the signal asynchronously.
- Examples of such signals include terminating a process with specific keystrokes (such as <control> <C>) and having a timer expire.
- Typically, an asynchronous signal is sent to another process.
Thread Pools
- Create a number of threads in a pool where they await work
- Advantages:
- Usually slightly faster to service a request with an existing thread than create a new thread
- Allows the number of threads in the application(s) to be bound to the size of the pool
- Issue:
- Concerns the amount of time required to create the thread prior to servicing the request, together with the fact that this thread will be discarded once it has completed its work.
- If we allow all concurrent requests to be serviced in a new thread, we have not placed a bound on the number of threads concurrently active in the system.
- Unlimited threads could exhaust system resources, such as CPU time or memory. One solution to this problem is to use a thread pool
- General idea behind a thread pool
- Create a number of threads at process startup and place them into a pool, where they sit and wait for work.
- When a server receives a request, it awakens a thread from this pool - if one is available - and passes it the request for service.
- Once the thread completes its service, it returns to the pool and awaits more work.
- If the pool contains no available thread, the server waits until one becomes free.
Thread Specific Data
- Threads belonging to a process share the data of the process.
- In some circumstances, each thread need its own copy of certain data.
- We will call such data thread specific data
- In a transaction-processing system, we might service each transaction in a separate thread.
- each transaction might be assigned a unique identifier.
- To associate each thread with its unique identifier, we could use thread-specific data.
- Most thread libraries-including Win32 and Pthreads-provide some form of support for thread-specific data. Java provides support as well.
Scheduler Activations
- Concerns communication between the kernel and the thread library
- may be required by the many-to-many and two-level models
- allows the number of kernel threads to be dynamically adjusted to help ensure the best performance.
- Many systems implementing either the many-to-many or the two-level model place an intermediate data structure between the user and kernel threads.
- This data structure-typically known as a lightweight process, or LWP
- To the user-thread library, the LWP appears to be a virtual processor on which the application can schedule a user thread to run.
- Each LWP is attached to a kernel thread, and it is kernel threads that the operating system schedules to run on physical processors.
- If a kernel thread blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well.
- Up the chain, the user-level thread attached to the LWP also blocks.
- An application may require any number of LWPs to run efficiently.
- Consider a CPU-bound application running on a single processor.
- Only one thread can run at once, so one LWP is sufficient.
- An application that is I/O-intensive may require multiple LWPs to execute.
- Typically, an LWP is required for each concurrent blocking system call.
- For example, if five different file-read requests occur simultaneously. Five LWPs are needed, because all could be waiting for I/O completion in the kernel.
- If a process has only four LWPs, then the fifth request must wait for one of the LWPs to return from the kernel.
- One scheme for communication between the user-thread library and the kernel is known as scheduler activations
- The kernel provides an application with a set of virtual processors (LWPs), and the application can schedule user threads onto an available virtual processor.
- Furthermore, the kernel must inform an application about certain events.
- This procedure is known as an upcalls are handled by the thread library with an upcall handler and upcall handlers must run on a virtual processor.
- One event that triggers an upcall occurs when an application thread is about to block.
- In this scenario, the kernel makes an upcall to the application informing it that a thread is about to block and identifying the specific thread.
- The kernel then allocates a new virtual processor to the application.
- The application runs an upcall handler on this new virtual processor, which saves the state of the blocking thread and relinquishes(放棄) the virtual processor on which the blocking thread is running.
- The upcall handler then schedules another thread that is eligible to run on the new virtual processor.
- When the event that the blocking thread was waiting for occurs, the kernel makes another upcall to the thread library informing it that the previously blocked thread is now eligible to run.
- The up call handler for this event also requires a virtual processor, and the kernel may allocate a new virtual processor or preempt(強佔) one of the user threads and run the upcall handler on its virtual processor.
- After marking the unblocked thread as eligible to run, the application schedules an eligible thread to run on an available virtual processor.
- Implements the one-to-one mapping, kernel-level
- Each thread contains
- A thread id
- Register set
- Separate user and kernel stacks
- Private data storage area
- The register set, stacks, and private storage area are known as the context of the threads
- The primary data structures of a thread include:
- ETHREAD (executive thread block)
- KTHREAD (kernel thread block)
- TEB (thread environment block)
Linux threads
- Linux refers to them as tasks rather than threads
- Thread creation is done through clone() system call
- clone() allows a child task to share the address space of the parent task (process)
Thread Design Patterns
Common ways of structuring programs using threads
- Boss/workers model
- boss gets assignments, dispatches tasks to workers
- variants (thread pool, single thread per connection...)
- Advantage: simplicity
- Disadvantage:
- bound on number of workers
- overheard of threads creation,
- contention if requests have inter-dependencies
- Variants:
- fixed thread pool (aka workpile, workqueue), producer/consumer relationship, workers determine what needs to be performed...
- Pipeline model
- do some work, pass partial result to next thread
- Each thread completes portion of a task, and passes results
like an assembly line or a processor pipeline - Advantages: trivial synchronization, simplicity
- Disadvantages:
- limits degree of parallelism, throughput driven by slowest
stage, hand-tuning needed
- Up-calls
- fast control flow transfer for layered systems
- Layered applications, e.g. network protocol stacks have top-
down and bottom-up flows - Up-calls is a technique in which you structure layers so that
they can expect calls from below - Thread pool of specialized threads in each layer
- essentially an up-call pipeline per connection
- Advantages:
- best when used with fast, synchronous control
flow transfer mechanisms or program structuring tool - Disadvantages:
- programming becomes more complicated, synchronization required for top-down
- Version stamps
- technique for keeping information consistent
- (Not a programming structure idea but useful technique for any kind of distributed environment)
- Maintain “version number” for shared data
- keep local cached copy of data
- check versions to determine if changed
Reference:
Operating System Concepts 8th, by Silberschatz, Galvin, Gagne
Wikipedia
沒有留言:
張貼留言