9 Complex Pipelining
Basics
Advanced Data Hazard
Out-of-order write hazards due to variable latencies of different functional units(FDIV,FMUL takes more than one cycle)
In-order and Out-of-order
RAW(red) WAR(purple) WAW(green)
regrardless of WAR, look for what can be fetched next(dependency), if empty finish execute waiting instruction
Table for Correct Issue
In-order
Assuming: Suppose the instruction is not issued by the Issue stage if a RAW hazard exists or the required FU is busy,and that operands are registered by functional unit on issue
- NO WAR: Earlier instructions read their operands at issue
- no need to keep src1 and src2
- still WAW hazard: e.g. Out-of-order completion
- WP[reg#],a bit-vector to record the registers for which writes are pending(cleared after WB stage)
- Busy[FU#],a bit-vector to indicate FU’s availability(FU= Int,Add,Mult,Div)
Lookup:
- FU available? Busy[FU#]
- RAW? WP[src1] or WP[src2]
- WAW? WP[dest]
Example
Diagnol means pipeline
In-oder issue limitations
Out-of-order
Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard.
Note: WAR possible again because issue is out-of-order(WAR not possible with in-order issue and registering of input operands at functional unit)
Example
Number of Registers limit the number of instructions in the pipeline.
Solution: Renaming 5 f4->f4'(no WAW hazard)
Renaming Table
Decode does register renaming and adds instructions to the issue-stage instruction reorder buffer(ROB)
tag deallocated-> broadcast and some src p1(present) set to 1
Reorder Buffer
ROB managed circularly
exec bit is set when instruction begins execution
When an instruction completes, its use bit is marked free(ptr2 incremented and tag deallocated)
deallocated also known as commited
Instruction is candidate for execution when
- use bit is set
- p1 and p2 are set
- has not started execution(exec 0)
In-Order Commit for Precise Exceptions
Physical Register Management
Decoding and Despatching all the instructions into ROB first, then go back and execute
- Free List: Physical Register currently available
- Rd: Architecture Register
- LPRd: Last Physical Register
Superscalar Register Renaming
Load-Store Issue
- Store commits when oldest instruction and
both address and data available:
- clear speculative bit and eventually move data to cache
- On store abort:clear valid bit
Conservative Out-of-order Load Execution
Branch Prediction
BHT
- Temporal correlation: The way a branch resolves may be a good predictor of the way it will resolve at the next execution
- Spatial correlation: Several branches may resolve in a highly correlated manner(a preferred path of execution)
Temporal Correlation
One bit Branch history predictor
last fail -> invert the bit
Two bits Branch predictor
Change the prediction after two consecutive mis-predictions
Branch History Table
Spatial Correlation
Notion: A History register, records the direction of the last N branches executed by the processor
BTB
Limitations of BHT
Branch Target Buffer