Issue 175248
Summary s_wait_alu in wmma shadow causes unnecessary stalls
Labels backend:AMDGPU, missed-optimization
Assignees jayfoad, vporpo
Reporter kerbowa
    The SIInsertWaitcnts pass could avoid extra stalls by inserting waits outside of the co-execution windows of long running instructions.

In the example below, the S_WAITCNT pseudo should be hoisted above the WMMA:

```
$vgpr1 = V_EXP_F32_e32 $vgpr0
V_WMMA
S_WAITCNT_DEPCTR 3999
$vgpr0 = DS_READ
```

Additionally, as follow-up or related work awareness of the number of independent instructions required to avoid the need to insert these waits could be added. Example MIR: https://gist.github.com/kerbowa/fac0c48fb9f8dd85f417f38779ca4584
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to