Issue |
123212
|
Summary |
[AMDGPU][GISel] Missing (or not running) combine for `sra workitem.id.xx, 31`
|
Labels |
|
Assignees |
|
Reporter |
qcolombet
|
In the AMDGPU backend, GISel ends up with additional instructions because we are missing some simplification that could take advantage of the range of the `workitem.id.xx` values.
I am somewhat surprised because I see that the AMDGPU backend implements the `TargetLowering::computeKnownBitsForTargetInstr` method and has some logic to propagate the known bits for these intrinsics.
Bottom line, I haven't dug into why the simplification doesn't happen, that may be an easy fix.
Anyhow, the issue at hand is that `sra workitem.id.xx, 31` could be simplified in `shl workitem.id.xx, 31` and then further simplified in a plain `0`.
# To Reproduce #
Download the attached reproducer or copy/paste the LLVM IR at the end of this issue.
[repro.ll.txt](https://github.com/user-attachments/files/18441382/repro.ll.txt)
Then run:
```bash
llc -O3 -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> reduced.ll -o -
```
# Result #
With GISel we have a `sra` and `xor` in the final assembly, whereas they could be eliminated.
With GISel:
```asm
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_and_b32_e32 v2, 0x3ff, v31
v_ashrrev_i32_e32 v3, 31, v2
v_xor_b32_e32 v2, v3, v2
flat_store_dword v[0:1], v2
s_waitcnt vmcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
```
With SDISel:
```asm
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_and_b32_e32 v2, 0x3ff, v31
flat_store_dword v[0:1], v2
s_waitcnt vmcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
```
# Note #
Input LLVM IR:
```llvm
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"
declare noundef i32 @llvm.amdgcn.workgroup.id.x()
define dso_local void @foo.bb.split(ptr %out) {
newFuncRoot:
%i = tail call i32 @llvm.amdgcn.workitem.id.x()
%.lobit = ashr i32 %i, 31
%i32 = xor i32 %.lobit, %i
store i32 %i32, ptr %out
ret void
}
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs