ParallelPrograming-2

here for website

  • Author: Yiqing Ma (Hongkong University of Science and Technology )
  • Parallel Programming, Course


Course

  • The Von Neumann Architecture

  • Main memory

    • It is a collection of locations, each of which is capable of storing both instructions and data.
  • Key terms
    • Register
    • Program counter
    • Bus
  • Von Neumann bottleneck
    • limited register
    • bandwidth limited
    • latency
  • An operating system “process”
  • Multitask
    • schedule program
    • time slice (each process takes turns running)
    • After it is time up, it waits until it has a turn again
  • Thread
    • Threads are contained within processes
  • A process and two threads

    • start a thread is called fork
    • terminate a thread is called join
    • if only one cpu , the two threads run in time slice
  • Modifications to the Von Neumann Model

    • a collection of memory locations that can be accessed in less time than some other memory.
  • Locality
    • Spatial locality
    • Temporal locality
    • Example of locality
  • Levels of cache
    • At the very beginning ,only l1 cache
    • l2 ,l3 (largest but slow)
  • Cache hit
    • fetch
  • cache miss
    • option1: must send message to the main memory, slow
    • option2: write back, write in the mem andm
    • hash function to find the location
    • this cache is too small for direct mapping
  • Example
    • memory content into the cache
    • Direct map
    • the content has the memory address content
    • hash design is in hardware
    • Memory is the
    • hash replacement is out of control
  • Cache Eviction
    • Caches are much smaller than main memory
  • Caches and programs
    • Array
  • Cache Line

    • A[0][0] A[0][1] A[0][2] A[0][3]
    • first pair of loop is more efficient than second pair
  • Vritual Memory

    • If we run a very large program or a program that access very large data sets, do a mapping from the virtual memory to physical memory
    • It utilize the temporal locality
    • Swap space and pages
  • Virtual pageg numbers

    • page table is used for mapping virtual address to physical address
    • Virtual Page Number
  • Translation-lookaside buffer(TLB)

    • Look up very frequently , then put in the TLB (Smaller Table)
  • Page fault

    • page number not in the page table
  • Instruction level parallelism(ILP)

    • 1.Multiple cores multiple components(e.g. two ALUs you can do additions together)
    • 2.Pipelining - functional units are arranged in stages
    • 3.Multiple issue - multiple instructions can be simultaneously initiated
  • 2.Pipelining Example

    • divide the single instructions into mutliple operations
      1. memory fetch
      1. comparison same exponents
      1. shift one operand
      1. add
      1. normalize result
      1. round result
      1. store result
  • 3.Multiple issue

    • which addition to which adder
    • Static & Dynamic
    • Practice Additions
  • Speculation

    • Register to Store
    • Hardware to Multithreading
  • Hardware Multithreading

    • The processor switches between threads after each instruction skipping threads that are stalled.
    • Simultaneous Multithreading (SMT)