- Author:
Yiqing Ma
(Hongkong University of Science and Technology ) Parallel Programming, Course
Course
The Von Neumann Architecture
Main memory
- It is a collection of locations, each of which is capable of storing both instructions and data.
- Key terms
- Register
- Program counter
- Bus
- Von Neumann bottleneck
- limited register
- bandwidth limited
- latency
- An operating system “process”
- Multitask
- schedule program
- time slice (each process takes turns running)
- After it is time up, it waits until it has a turn again
- Thread
- Threads are contained within processes
A process and two threads
- start a thread is called fork
- terminate a thread is called join
- if only one cpu , the two threads run in time slice
Modifications to the Von Neumann Model
- a collection of memory locations that can be accessed in less time than some other memory.
- Locality
- Spatial locality
- Temporal locality
- Example of locality
- Levels of cache
- At the very beginning ,only l1 cache
- l2 ,l3 (largest but slow)
- Cache hit
- fetch
- cache miss
- option1: must send message to the main memory, slow
- option2: write back, write in the mem andm
- hash function to find the location
- this cache is too small for direct mapping
- Example
- memory content into the cache
- Direct map
- the content has the memory address content
- hash design is in hardware
- Memory is the
- hash replacement is out of control
- Cache Eviction
- Caches are much smaller than main memory
- Caches and programs
- Array
Cache Line
- A[0][0] A[0][1] A[0][2] A[0][3]
- first pair of loop is more efficient than second pair
Vritual Memory
- If we run a very large program or a program that access very large data sets, do a mapping from the virtual memory to physical memory
- It utilize the temporal locality
- Swap space and pages
Virtual pageg numbers
- page table is used for mapping virtual address to physical address
- Virtual Page Number
Translation-lookaside buffer(TLB)
- Look up very frequently , then put in the TLB (Smaller Table)
Page fault
- page number not in the page table
Instruction level parallelism(ILP)
- 1.Multiple cores multiple components(e.g. two ALUs you can do additions together)
- 2.Pipelining - functional units are arranged in stages
- 3.Multiple issue - multiple instructions can be simultaneously initiated
2.Pipelining Example
- divide the single instructions into mutliple operations
- memory fetch
- comparison same exponents
- shift one operand
- add
- normalize result
- round result
- store result
3.Multiple issue
- which addition to which adder
- Static & Dynamic
- Practice Additions
Speculation
- Register to Store
- Hardware to Multithreading
Hardware Multithreading
- The processor switches between threads after each instruction skipping threads that are stalled.
- Simultaneous Multithreading (SMT)