# **Basic Pipelined Processor**

Hung-Wei Tseng



### Outline

- Pipelining
- Pipeline Hazards
- Structural Hazards
- Control Hazards
- Dynamic Branch Predictions

## **Tasks in RISC-V ISA**

- Instruction Fetch (IF) fetch the instruction from memory
- Instruction Decode (ID)
  - Decode the instruction for the desired operation and operands
  - Reading source register values
- Execution (**EX**)
  - ALU instructions: Perform ALU operations
  - Conditional Branch: Determine the branch outcome (taken/not taken)
  - Memory instructions: Determine the effective address for data memory access
- Data Memory Access (MEM) Read/write memory
- Write Back (WB) Present ALU result/read value in the target register
- Update PC
  - If the branch is taken set to the branch target address
  - Otherwise advance to the next instruction current PC + 4

## Simple implementation w/o branch

- add x1, x2, x3 ID IF EX WB
- ld x4, 0(x5)
- sub x6, x7, x8
- sub x9, x10, x11
- sd x1, 0(x12)











1







- Different parts of the processor works on different instructions simultaneously
- A clock signal controls and synchronize the beginning and the end of each part of the work
- A pipeline register between different parts of the processor to keep intermediate results necessary for the upcoming work



add x1, x2, x3 ld x4, 0(x5) sub x6, x7, x8 sub x9, x10, x11 sd x1, 0(x12) xor x13, x14, x15 and x16, x17, x18 add x19, x20, x21 sub x22, x23, x24 ld x25, 4(x26) sd x27, 0(x28)

IF

| <br> |    |     |                                                              |     |     |     |  |  |  |  |
|------|----|-----|--------------------------------------------------------------|-----|-----|-----|--|--|--|--|
| ID   | EX | MEM | WB                                                           |     |     |     |  |  |  |  |
| IF   | ID | EX  | MEM                                                          | WB  |     |     |  |  |  |  |
|      | IF | ID  | EX                                                           | MEM | WB  |     |  |  |  |  |
|      |    | IF  | ID                                                           | EX  | MEM | WB  |  |  |  |  |
|      |    |     | IF                                                           | ID  | EX  | MEM |  |  |  |  |
|      |    |     |                                                              | IF  | ID  | EX  |  |  |  |  |
|      |    |     |                                                              |     | IF  | ID  |  |  |  |  |
|      |    |     |                                                              |     |     | IF  |  |  |  |  |
|      |    |     | After this point,<br>we are completing<br>instruction each c |     |     |     |  |  |  |  |
|      |    | -   |                                                              |     |     |     |  |  |  |  |



## **Draw the pipeline diagrams**





# Both instructions

# **Can we get them right?**

 Given a simple pipelined RISC-V processor that we discussed so far, how many of the following code snippets can be executed with expected outcome?





|   | IV  |     |         |  |  |  |  |  |  |  |
|---|-----|-----|---------|--|--|--|--|--|--|--|
| 3 |     |     | x2, x3  |  |  |  |  |  |  |  |
|   |     |     | 0(x5)   |  |  |  |  |  |  |  |
|   | sub | x6, | x7, x8  |  |  |  |  |  |  |  |
| L | sub | x9, | x10,x11 |  |  |  |  |  |  |  |
| ) | sd  | x1, | 0(x12)  |  |  |  |  |  |  |  |

# **Pipeline hazards**



# **Three pipeline hazards**

- Structural hazards resource conflicts cannot support simultaneous execution of instructions in the pipeline
- Control hazards the PC can be changed by an instruction in the pipeline
- Data hazards an instruction depending on a the result that's not yet generated or propagated when the instruction needs that



# **Can we get them right?**

 Given a simple pipelined RISC-V processor that we discussed so far, how many of the following code snippets can be executed with expected outcome?





|   | IV  |     |         |  |  |  |  |  |  |  |
|---|-----|-----|---------|--|--|--|--|--|--|--|
| 3 |     |     | x2, x3  |  |  |  |  |  |  |  |
|   |     |     | 0(x5)   |  |  |  |  |  |  |  |
|   | sub | x6, | x7, x8  |  |  |  |  |  |  |  |
| L | sub | x9, | x10,x11 |  |  |  |  |  |  |  |
| ) | sd  | x1, | 0(x12)  |  |  |  |  |  |  |  |

# **Structural Hazards**



## **Dealing with the conflicts between ID/WB**

- The same register cannot be read/written at the same cycle
- Solution: insert no-ops (e.g. add x0, x0, x0) between them
- Drawback
  - If the number of pipeline stages changes, the code won't work
  - Slow

add x1, x2, x3 ld x4, 0(x5) sub x6, x7, x8 add x0, x0, x0 sub x9, **x1**, x10 sd x11, 0(x12)





## **Dealing with the conflicts between ID/WB**

- The same register cannot be read/written at the same cycle
- Solution: stall the later instruction, allowing the write to present the change in the register and the later can get the desired value
- Drawback: slow



## **Dealing with the conflicts between ID/WB**

- The same register cannot be read/written at the same cycle
- Better solution: write early, read late
  - Writes occur at the clock edge and complete long enough before the end of the clock cycle.
  - This leaves enough time for outputs to settle for reads
  - The revised register file is the default one from now!







### **Structural Hazards**

 What pair of instructions will be problematic if we allow ALU instructions to skip the "MEM" stage?



### **Structural Hazards**

- Stall can address the issue but slow
- Improve the pipeline unit design to allow parallel execution

# **Control Hazards**

## The impact of control hazards

 Assuming that we have an application with 20% of branch instructions and the instruction stream incurs no data hazards. When there is a branch, we disable the instruction fetch and insert no-ops until we can determine the PC. What's the average CPI if we execute this program on the 5-stage RISC-V pipeline?

A. 1 B. 1.2 C. 1.4 D. 1.6 E. 1.8

|          | add x1  | , x2,        | x3    | IF | ID | EX | MEM | WB  |     |     |     |     |     |    |
|----------|---------|--------------|-------|----|----|----|-----|-----|-----|-----|-----|-----|-----|----|
| <b>つ</b> | ld x4   | , 0(x5       | 5)    |    | IF | ID | EX  | MEM | WB  |     |     |     |     |    |
| Z        | bne x0  | , x7,        | L .   |    |    | IF | ID  | EX  | MEM | WB  |     |     |     |    |
| 2<br>4   | add x0  |              |       |    |    |    | IF  | ID  | EX  | MEM | WB  |     |     |    |
| 6        | add x0  | , x0,        | x0    |    |    |    |     | IF  | ID  | EX  | MEM | WB  |     |    |
|          | sub x9  | ,x10,x       | 11    |    |    |    |     |     | IF  | ID  | EX  | MEM | WB  |    |
| 8        | sd x1   |              |       |    |    |    |     |     |     | IF  | ID  | EX  | MEM | WB |
|          | 1 + 20% | ~<br>% × 2 = | = 1.4 |    |    |    |     |     |     |     |     |     |     |    |

### Why can't we proceed without stalls/no-ops?

- How many of the following statements are true regarding why we have to stall for each branch in the current pipeline processor
  - The target address when branch is taken is not available for instruction fetch stage of the next cycle
  - ② The target address when branch is not-taken is not available for instruction fetch stage of the next cycle
  - The branch outcome cannot be decided until the comparison result of ALU is not out
  - ④ The next instruction needs the branch instruction to write back its result
  - A. 0
  - B. 1
  - C. 2
    - D. 3

E. 4

n result of ALU is not out ck its result

# **Dynamic Branch Prediction**

### Why can't we proceed without stalls/no-ops?

- How many of the following statements are true regarding why we have to stall for each branch in the current pipeline processor
  - The target address when branch is taken is not available for instruction fetch stage of the next cycleYou need a cheatsheet for that — branch target buffer
  - ② The target address when branch is not-taken is not available for instruction fetch
  - stage of the next cycle. You need to predict that history/states The branch outcome cannot be decided until the comparison result of ALU is not out
  - The next instruction needs the branch instruction to write back its result 4
  - A. 0
  - B. 1
  - - D. 3

## Announcement

- Pickup midterm
  - Outside of Hung-Wei's office from 1p today
  - If you would like to regrade
    - Write a report on why should we.
    - We will regrade the whole test might lower your grades
- Reading quiz due next Wednesday
- No class next Monday Veterans' Day Holiday!
- Project
  - Watch carefully on piazza/iLearn/website, should be announced by this Friday