Skip to main content

9 Complex Pipelining

Basics

Advanced Data Hazard

Out-of-order write hazards due to variable latencies of different functional units(FDIV,FMUL takes more than one cycle)

20220513105012

In-order and Out-of-order

RAW(red) WAR(purple) WAW(green)

20220513105012

regrardless of WAR, look for what can be fetched next(dependency), if empty finish execute waiting instruction

20220513105012

Table for Correct Issue

20220519214047

In-order

Assuming: Suppose the instruction is not issued by the Issue stage if a RAW hazard exists or the required FU is busy,and that operands are registered by functional unit on issue

  • NO WAR: Earlier instructions read their operands at issue
    • no need to keep src1 and src2
  • still WAW hazard: e.g. Out-of-order completion
    • WP[reg#],a bit-vector to record the registers for which writes are pending(cleared after WB stage)
    • Busy[FU#],a bit-vector to indicate FU’s availability(FU= Int,Add,Mult,Div)

Lookup:

  • FU available? Busy[FU#]
  • RAW? WP[src1] or WP[src2]
  • WAW? WP[dest]

Example

Diagnol means pipeline

20220519214902

In-oder issue limitations

20220519215254

Out-of-order

Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard.

Note: WAR possible again because issue is out-of-order(WAR not possible with in-order issue and registering of input operands at functional unit)

Example

Number of Registers limit the number of instructions in the pipeline.

20220520101336

Solution: Renaming 5 f4->f4'(no WAW hazard)

20220520103228

Renaming Table

Decode does register renaming and adds instructions to the issue-stage instruction reorder buffer(ROB)

tag deallocated-> broadcast and some src p1(present) set to 1

20220520111838

Reorder Buffer

ROB managed circularly

  • exec bit is set when instruction begins execution

  • When an instruction completes, its use bit is marked free(ptr2 incremented and tag deallocated)

    deallocated also known as commited

Instruction is candidate for execution when

  • use bit is set
  • p1 and p2 are set
  • has not started execution(exec 0)

20220520112029

images

In-Order Commit for Precise Exceptions

20220522152822

Physical Register Management

Decoding and Despatching all the instructions into ROB first, then go back and execute

  • Free List: Physical Register currently available
  • Rd: Architecture Register
  • LPRd: Last Physical Register

ROB

Execution

Superscalar Register Renaming

20220522161111

Load-Store Issue

  • Store commits when oldest instruction and both address and data available:
    - clear speculative bit and eventually move data to cache
  • On store abort:clear valid bit

20220522161623

Conservative Out-of-order Load Execution

20220522161722

Branch Prediction

BHT

  • Temporal correlation: The way a branch resolves may be a good predictor of the way it will resolve at the next execution
  • Spatial correlation: Several branches may resolve in a highly correlated manner(a preferred path of execution)

Temporal Correlation

One bit Branch history predictor

last fail -> invert the bit

20220529165614

Two bits Branch predictor

Change the prediction after two consecutive mis-predictions

20220529165814

Branch History Table

20220529165939

Spatial Correlation

Notion: A History register, records the direction of the last N branches executed by the processor

20220529170306

BTB

Limitations of BHT

20220529170514

Branch Target Buffer

20220529170628