| Issue |
175248
|
| Summary |
s_wait_alu in wmma shadow causes unnecessary stalls
|
| Labels |
backend:AMDGPU,
missed-optimization
|
| Assignees |
jayfoad,
vporpo
|
| Reporter |
kerbowa
|
The SIInsertWaitcnts pass could avoid extra stalls by inserting waits outside of the co-execution windows of long running instructions.
In the example below, the S_WAITCNT pseudo should be hoisted above the WMMA:
```
$vgpr1 = V_EXP_F32_e32 $vgpr0
V_WMMA
S_WAITCNT_DEPCTR 3999
$vgpr0 = DS_READ
```
Additionally, as follow-up or related work awareness of the number of independent instructions required to avoid the need to insert these waits could be added. Example MIR: https://gist.github.com/kerbowa/fac0c48fb9f8dd85f417f38779ca4584
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs