9 Complex Pipelining
Basics
Advanced Data Hazard
Out-of-order write hazards due to variable latencies of different functional units(FDIV,FMUL takes more than one cycle)

In-order and Out-of-order
RAW(red) WAR(purple) WAW(green)

regrardless of WAR, look for what can be fetched next(dependency), if empty finish execute waiting instruction

Table for Correct Issue

In-order
Assuming: Suppose the instruction is not issued by the Issue stage if a RAW hazard exists or the required FU is busy,and that operands are registered by functional unit on issue
- NO WAR: Earlier instructions read their operands at issue
- no need to keep src1 and src2
- still WAW hazard: e.g. Out-of-order completion
- WP[reg#],a bit-vector to record the registers for which writes are pending(cleared after WB stage)
- Busy[FU#],a bit-vector to indicate FU’s availability(FU= Int,Add,Mult,Div)
Lookup:
- FU available? Busy[FU#]
- RAW? WP[src1] or WP[src2]
- WAW? WP[dest]
Example
Diagnol means pipeline

In-oder issue limitations

Out-of-order
Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard.
Note: WAR possible again because issue is out-of-order(WAR not possible with in-order issue and registering of input operands at functional unit)
Example
Number of Registers limit the number of instructions in the pipeline.

Solution: Renaming 5 f4->f4'(no WAW hazard)

Renaming Table
Decode does register renaming and adds instructions to the issue-stage instruction reorder buffer(ROB)
tag deallocated-> broadcast and some src p1(present) set to 1

Reorder Buffer
ROB managed circularly
exec bit is set when instruction begins execution
When an instruction completes, its use bit is marked free(ptr2 incremented and tag deallocated)
deallocated also known as commited
Instruction is candidate for execution when
- use bit is set
- p1 and p2 are set
- has not started execution(exec 0)


In-Order Commit for Precise Exceptions

Physical Register Management
Decoding and Despatching all the instructions into ROB first, then go back and execute
- Free List: Physical Register currently available
- Rd: Architecture Register
- LPRd: Last Physical Register


Superscalar Register Renaming

Load-Store Issue
- Store commits when oldest instruction and
both address and data available:
- clear speculative bit and eventually move data to cache - On store abort:clear valid bit

Conservative Out-of-order Load Execution

Branch Prediction
BHT
- Temporal correlation: The way a branch resolves may be a good predictor of the way it will resolve at the next execution
- Spatial correlation: Several branches may resolve in a highly correlated manner(a preferred path of execution)
Temporal Correlation
One bit Branch history predictor
last fail -> invert the bit

Two bits Branch predictor
Change the prediction after two consecutive mis-predictions

Branch History Table

Spatial Correlation
Notion: A History register, records the direction of the last N branches executed by the processor

BTB
Limitations of BHT

Branch Target Buffer
