Friday, 6 October 2017

Multiprocessing Control and Algorithms

Inter Process Communication

A process can be of two type:

Independent process.
Co-operating process.

An independent process is not affected by the execution of other processes while a co-operating process can be affected by other executing processes. Though one can think that those processes, which are running independently, will execute very efficiently but in practical, there are many situations when co-operative nature can be utilised for increasing computational speed, convenience and modularity. Inter process communication (IPC) is a mechanism which allows processes to communicate each other and synchronize their actions. The communication between these processes can be seen as a method of co-operation between them. Processes can communicate with each other using these two ways:

Shared Memory
Message passing

The Figure 1 below shows a basic structure of communication between processes via shared memory method and via message passing.

An operating system can implement both method of communication. First, we will discuss the shared memory method of communication and then message passing. Communication between processes using shared memory requires processes to share some variable and it completely depends on how programmer will implement it. One way of communication using shared memory can be imagined like this: Suppose process1 and process2 are executing simultaneously and they share some resources or use some information from other process, process1 generate information about certain computations or resources being used and keeps it as a record in shared memory. When process2 need to use the shared information, it will check in the record stored in shared memory and take note of the information generated by process1 and act accordingly. Processes can use shared memory for extracting information as a record from other process as well as for delivering any specific information to other process.
Let’s discuss an example of communication between processes using shared memory method.

i) Shared Memory Method

Ex: Producer-Consumer problem
There are two processes: Producer and Consumer. Producer produces some item and Consumer consumes that item. The two processes shares a common space or memory location known as buffer where the item produced by Producer is stored and from where the Consumer consumes the item if needed. There are two version of this problem: first one is known as unbounded buffer problem in which Producer can keep on producing items and there is no limit on size of buffer, the second one is known as bounded buffer problem in which producer can produce up to a certain amount of item and after that it starts waiting for consumer to consume it. We will discuss the bounded buffer problem. First, the Producer and the Consumer will share some common memory, then producer will start producing items. If the total produced item is equal to the size of buffer, producer will wait to get it consumed by the Consumer. Sim-
ilarly, the consumer first check for the availability of the item and if no item is available, Consumer will wait for producer to produce it. If there are items available, consumer will consume it.

ii) Messaging Passing Method

Now, We will start our discussion for the communication between processes via message passing. In this method, processes communicate with each other without using any kind of of shared memory. If two processes p1 and p2 want to communicate with each other, they proceed as follow:

Establish a communication link (if a link already exists, no need to establish it again.)
Start exchanging messages using basic primitives.
We need at least two primitives:
– send(message, destinaion) or send(message)
– receive(message, host) or receive(message)

The message size can be of fixed size or of variable size. if it is of fixed size, it is easy for OS designer but complicated for programmer and if it is of variable size then it is easy for programmer but complicated for the OS designer. A standard message can have two parts: header and body.
The header part is used for storing Message type, destination id, source id, message length and control information. The control information contains information like what to do if runs out of buffer space, sequence number, priority. Generally, message is sent using FIFO style.

Message Passing through Communication Link.

Direct and Indirect Communication link
Now, We will start our discussion about the methods of implementing communication link. While implementing the link, there are some questions which need to be kept in mind like :

How are links established?
Can a link be associated with more than two processes?
How many links can there be between every pair of communicating processes?
What is the capacity of a link? Is the size of a message that the link can accommodate fixed or variable?
Is a link unidirectional or bi-directional?

A link has some capacity that determines the number of messages that can reside in it temporarily for which Every link has a queue associated with it which can be either of zero capacity or of bounded capacity or of unbounded capacity. In zero capacity, sender wait until receiver inform sender that it has received the message. In non-zero capacity cases, a process does not know whether a message has been received or not after the send operation. For this, the sender must communicate to receiver explicitly. Implementation of the link depends on the situation, it can be either a Direct communication link or an In-directed communication link.
Direct Communication links are implemented when the processes use specific process identifier for the communication but it is hard to identify the sender ahead of time.
For example: the print server.

In-directed Communication is done via a shred mailbox (port), which consists of queue of messages. Sender keeps the message in mailbox and receiver picks them up.

Message Passing through Exchanging the Messages.

Synchronous and Asynchronous Message Passing:
A process that is blocked is one that is waiting for some event, such as a resource becoming available or the completion of an I/O operation. IPC is possible between the processes on same computer as well as on the processes running on different computer i.e. in networked/distributed system. In both cases, the process may or may not be blocked while sending a message or attempting to receive a message so Message passing may be blocking or non-blocking. Blocking is considered synchronous and blocking send means the sender will be blocked until the message is received by receiver. Similarly, blocking receive has the receiver block until a message is available. Non-blocking is considered asynchronous and Non-blocking send has the sender sends the message and continue. Similarly, Non-blocking receive has the receiver receive a valid message or null. After a careful analysis, we can come to a conclusion that, for a sender it is more natural to be non-blocking after message passing as there may be a need to send the message to different processes But the sender expect acknowledgement from receiver in case the send fails. Similarly, it is more natural for a receiver to be blocking after issuing the receive as the information from the received message may be used for further execution but at the same time, if the message send keep on failing, receiver will have to wait for indefinitely. That is why we also consider the other possibility of message passing. There are basically three most preferred combinations:

Blocking send and blocking receive
Non-blocking send and Non-blocking receive
Non-blocking send and Blocking receive (Mostly used)

In Direct message passing, The process which want to communicate must explicitly name the recipient or sender of communication.
e.g. send(p1, message) means send the message to p1.
similarly, receive(p2, message) means receive the message from p2.
In this method of communication, the communication link get established automatically, which can be either unidirectional or bidirectional, but one link can be used between one pair of the sender and receiver and one pair of sender and receiver should not possess more than one pair of link. Symmetry and asymmetry between the sending and receiving can also be implemented i.e. either both process will name each other for sending and receiving the messages or only sender will name receiver for sending the message and there is no need for receiver for naming the sender for receiving the message.The problem with this method of communication is that if the name of one process changes, this method will not work.

In Indirect message passing, processes uses mailboxes (also referred to as ports) for sending and receiving messages. Each mailbox has a unique id and processes can communicate only if they share a mailbox. Link established only if processes share a common mailbox and a single link can be associated with many processes. Each pair of processes can share several communication links and these link may be unidirectional or bi-directional. Suppose two process want to communicate though Indirect message passing, the required operations are: create a mail box, use this mail box for sending and receiving messages, destroy the mail box. The standard primitives used are : send(A, message) which means send the message to mailbox A. The primitive for the receiving the message also works in the same way e.g. received (A, message). There is a problem in this mailbox implementation. Suppose there are more than two processes sharing the same mailbox and suppose the process p1 sends a message to the mailbox, which process will be the receiver? This can be solved by either forcing that only two processes can share a single mailbox or enforcing that only one process is allowed to execute the receive at a given time or select any process randomly and notify the sender about the receiver. A mailbox can be made private to a single sender/receiver pair and can also be shared between multiple sender/receiver pairs. Port is an implementation of such mailbox which can have multiple sender and single receiver. It is used in client/server application (Here server is the receiver). The port is owned by the receiving process and created by OS on the request of the receiver process and can be destroyed either on request of the same receiver process or when the receiver terminates itself. Enforcing that only one process is allowed to execute the receive can be done using the concept of mutual exclusion. Mutex mailbox is create which is shared by n process. Sender is non-blocking and sends the message. The first process which executes the receive will enter in the critical section and all other processes will be blocking and will wait.

Deadlock

Deadlocks are a set of blocked processes each holding a resource and waiting to acquire a resource held by another process.

How to avoid Deadlocks

Deadlocks can be avoided by avoiding at least one of the four conditions, because all this four conditions are required simultaneously to cause deadlock.

Mutual Exclusion
Resources shared such as read-only files do not lead to deadlocks but resources, such as printers and tape drives, requires exclusive access by a single process.
Hold and Wait
In this condition processes must be prevented from holding one or more resources while simultaneously waiting for one or more others.
No Preemption
Preemption of process resource allocations can avoid the condition of deadlocks, where ever possible.
Circular Wait
Circular wait can be avoided if we number all resources, and require that processes request resources only in strictly increasing(or decreasing) order.

Handling Deadlock

The above points focus on preventing deadlocks. But what to do once a deadlock has occured. Following three strategies can be used to remove deadlock after its occurrence.

Preemption
We can take a resource from one process and give it to other. This will resolve the deadlock situation, but sometimes it does causes problems.
Rollback
In situations where deadlock is a real possibility, the system can periodically make a record of the state of each process and when deadlock occurs, roll everything back to the last checkpoint, and restart, but allocating resources differently so that deadlock does not occur.
Kill one or more processes
This is the simplest way, but it works.

What is a Livelock?

There is a variant of deadlock called livelock. This is a situation in which two or more processes continuously change their state in response to changes in the other process(es) without doing any useful work. This is similar to deadlock in that no progress is made but differs in that neither process is blocked or waiting for anything.

A human example of livelock would be two people who meet face-to-face in a corridor and each moves aside to let the other pass, but they end up swaying from side to side without making any progress because they always move the same way at the same time.

Stochastic Scheduling

Stochastic scheduling is concerned with scheduling problems in which the processing times of tasks are modelled as random variables. Thus a job's processing time is not known until it is complete. Scheduling may be preemptive or non-preemptive, occur on one or on many processors, and be concerned with various optimization criteria.

A typical result in this area is that if n jobs have processing times that are exponentially distributed with different means and are to be processed by m identical machines operating in parallel, then LEPT (longest expected processing time first) minimizes the expected makespan (the time at which all jobs are complete.)

Thursday, 7 September 2017

PIPELINING AND VECTOR PROCESSING

• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
• RISC Pipeline
• Vector Processing
• Array Processors

PARALLEL PROCESSING Levels of Parallel Processing - Job or Program level - Task or Procedure level - Inter-Instruction level - Intra-Instruction level

Architectural Classification Number of Data Streams Number of Instruction Streams Single Multiple Single Multiple SISD SIMD MISD MIMD * Flynn's classification - Based on the multiplicity of Instruction Streams and Data Streams - Instruction Stream Sequence of Instructions read from memory - Data Stream Operations performed on the data in the processor

SISD COMPUTER SYSTEMS Control Unit Processor Unit Memory Instruction stream Data stream Characteristics - Standard von Neumann machine - Instructions and data are stored in memory - One operation at a time Limitations Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth (bits/sec or bytes/sec) - Limitation on Memory Bandwidth - Memory is shared by CPU and I/O

PERFORMANCE IMPROVEMENTS • Multiprogramming • Spooling • Multifunction processor • Pipelining • Exploiting instruction-level parallelism - Superscalar - Superpipelining - VLIW (Very Long Instruction Word)

Image result for SISD and SIMD

In computing, SISD (single instruction stream, single data stream) is a computer architecture in which a single uni-core processor, executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture.

SISD is one of the four main classifications as defined in Flynn's taxonomy. In this system, classifications are based upon the number of concurrent instructions and data streams present in the computer architecture. According to Michael J. Flynn, SISD can have concurrent processing characteristics. Pipelined processors and superscalar processors are common examples found in most modern SISD computers.

Single instruction stream, multiple data streams (SIMD)

A computer which exploits multiple data streams against a single stream to perform operations which may be naturally parallelized. For example, an array processor or graphics processing unit (GPU)

Multiple instruction streams, single data stream (MISD)

Multiple instructions operate on one data stream. This is an uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer.

Multiple instruction streams, multiple data streams (MIMD)

Multiple autonomous processors simultaneously executing different instructions on different data. MIMD architectures include multi-core superscalar processors, and distributed systems, using either one shared memory space or a distributed memory space.

Single instruction, multiple threads (SIMT)

Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. This is not originally part of Flynn's taxonomy but a proposed addition.

Diagram comparing classifications

These four architectures are shown below visually. Each processing unit (PU) is shown for a uni-core or multi-core computer:

Pipelining Design Techniques

Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence.

• Each subtask is performed by a given functional unit.

• The units are connected in a serial fashion and all of them operate simultaneously.

• The use of Pipelining improves the performance as compared to the traditional sequential execution of tasks.

Three performance measures for the goodness of a pipeline are provided: – Speed-up S(n), – Throughput U(n), and – Efficiency E(n).

• It is assumed that the unit time T = t units.

Friday, 25 August 2017

Pipelining Concepts with Overlapped mechanisms

Pipelining – An overlapped parallelism

vTo achieve pipelining one must subdivide the input into a sequence of subtask, each of that executed concurrently with other stages in the pipeline

Principles of Linear Pipelining

qAssembly lines used in automated industries to increase productivity

qAll assembly lines in a pipelined processors must have same processing speed

qElse congestion problem may occur.

qThe precedence relation of a set of subtasks says that one state Tj cannot start until Ti finishes.

qLinear Pipelining process.

Clock Period:

-time delay

Tk=k+(n-1) denotes the output /result at each clock cycles

Speed Up:

Sk=T1/Tk=n.k/k+(n-1)

Classification of pipelined processors

Handler has proposed the following 6 schemas:

ÒArithmetic pipelining

ÒInstruction pipelining

ÒProcessor pipelining

ÒUnifunction vs multifunction pipelines

ÒStatic vs dynamic pipelines

ÒScalar vs vector pipelines

GENERAL PIPELINES AND RESERVATION TABLES

ÒLinear Pipeline – depends on the output to process the next instruction

ÒPipelines with feedback may have a non linear data flow

ÒGeneral Pipelines with feedback connections and feed forward

ÒNeed to use the reservation table

ÒForward connection –> j>=i+2

Feedforward connection j<=i

Interleaved memory organisation

ÒMemory bandwidth – avg no of words accessed per second

ÒFactors affecting the bandwidth are:

Processor architecture

Memory configuration

Memory module characteristics(Clock cycle, memory org)

Eg: four pipeline vector processors

ÒS- access memory organization

ÒC- access memory organization

ÒC/S access memory organization

S- access memory organization

ÒIt uses simply low order interleaving and applies (n-m) bits of the address to all M=2 memory modules in one access

ÒAll the modules gets accessed simultaneously

ÒHere, high order bits selects modules

ÒWords from modules are latched at the same time

ÒLower order bits selects word from latches

ÒThis is done through MUX with higher speed

C-access memory organisation

ÒIt allows m memory words to be accessed concurrently

ÒMore than one bank can be accessed so that it increases the bank utilisation and reducing the bank cost.

ÒThis is C-access m/y organisation

C/S access memory organisation

ÒN access busses with m interleaved memory modules

Ò n busess operates in parallel to allow S access

ÒCombination of both S access and C access

Thursday, 13 July 2017

Architectural Configurations of Parallel Computer Structures

Classification of parallel computers

1. Parallel Pipe lining

2. Array Processor

3. Multiprocessor Systems

https://classroom.google.com/c/Njk5MjgwNjA4NVpa

1. Parallel Pipe lining

•In computing, a pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one.

• The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements.

2. Array Processor

•In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to scalar processors, whose instructions operate on single data items. They are been classified as SISD(Single Instruction Stream , Single Data Stream)

•Also called a vector processor. A microprocessor that executes one instruction at a time but on an array or table of data at the same time rather than on single data elements.

3.Multiprocessor Systems

•Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them.

•They are been classified as MIMD systems(Multiple Instruction Stream, Multiple Data Stream)

•To increase the flexibility, availability, reliability and throughput multiprocessor systems have got evolved.

•They make use of direct interconnection between the processor and the memory

q Time shared common bus

q Crossbar switched network Multiport memories

Why to choose Multiprocessor systems?

•More than one CPU may increase the performance

•Multiple users

•Multiple applications

•Multitasking with an application

•Responsiveness and throughput

•Share hardware between CPUs

Tuesday, 27 June 2017

Parallel Processing Intro & Uni processor Systems

Introduction to parallel processing

In computers, parallel processing is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time. In the earliest computers, only one program ran at a time. A computation-intensive program that took one hour to run and a tape copying program that took one hour to run would take a total of two hours to run. An early form of parallel processing allowed the interleaved execution of both programs together. The computer would start an I/O operation, and while it was waiting for the operation to complete, it would execute the processor-intensive program. The total execution time for the two jobs would be a little over one hour.

The next improvement was multiprocessing. In a multi-programming system, multiple programs submitted by users were each allowed to use the processor for a short time. To users it appeared that all of the programs were executing at the same time. Problems of resource contention first arose in these systems. Explicit requests for resources led to the problem of the deadlock. Competition for resources on machines with no tie-breaking instructions lead to the critical section region. Vector processing was another attempt to increase performance by doing more than one thing at a time. In this case, capabilities were added to machines to allow a single instruction to add (or subtract, or multiply, or otherwise manipulate) two arrays of numbers. This was valuable in certain engineering applications where data naturally occurred in the form of vectors or matrices. In applications with less well-formed data, vector processing was not so valuable. The next step in parallel processing was the introduction of multiprocessing. In these systems, two or more processors shared the work to be done. The earliest versions had a master/slave configuration. One processor (the master) was programmed to be responsible for all of the work in the system; the other (the slave) performed only those tasks it was assigned by the master. This arrangement was necessary because it was not then understood how to program the machines so they could cooperate in managing the resources of the system.
UNI-PROCESSOR SYSTEMS
žFrom 16 32 bit registers one is going to be the PC.

žCPU consists of ALU with Floating point accelerator and Diagnostic memory

žBoth main memory and local memory gets directly connected to a common bus SBI

Parallelism and pipelining within the CPU

žALU contains parallel adders using the carry look ahead and carry-save.

žHigh speed multiplier recoding and convergence division techniques are used to achieve parallelism

žVarious phases of instruction executions are now pipelined such as Instruction fetch,decode, operand prefetch, arithmetic logic execution, and store results.

Parallel_Processing

Friday, 6 October 2017

Multiprocessing Control and Algorithms

Inter Process Communication

How to avoid Deadlocks

Handling Deadlock

What is a Livelock?

Stochastic Scheduling

Thursday, 7 September 2017

PIPELINING AND VECTOR PROCESSING

Multiple instruction streams, single data stream (MISD)

Multiple instruction streams, multiple data streams (MIMD)

Single instruction, multiple threads (SIMT)

Diagram comparing classifications

Friday, 25 August 2017

Pipelining Concepts with Overlapped mechanisms

Thursday, 13 July 2017

Architectural Configurations of Parallel Computer Structures

3.Multiprocessor Systems

Why to choose Multiprocessor systems?

Tuesday, 27 June 2017

Parallel Processing Intro & Uni processor Systems

Introduction to parallel processing

Multiprocessing Control and Algorithms

Search This Blog