Issue 123212
Summary [AMDGPU][GISel] Missing (or not running) combine for `sra workitem.id.xx, 31`
Labels
Assignees
Reporter qcolombet
    In the AMDGPU backend, GISel ends up with additional instructions because we are missing some simplification that could take advantage of the range of the `workitem.id.xx` values.

I am somewhat surprised because I see that the AMDGPU backend implements the `TargetLowering::computeKnownBitsForTargetInstr` method and has some logic to propagate the known bits for these intrinsics.
Bottom line, I haven't dug into why the simplification doesn't happen, that may be an easy fix.

Anyhow, the issue at hand is that `sra workitem.id.xx, 31` could be simplified in `shl workitem.id.xx, 31` and then further simplified in a plain `0`.

# To Reproduce #

Download the attached reproducer or copy/paste the LLVM IR at the end of this issue.
[repro.ll.txt](https://github.com/user-attachments/files/18441382/repro.ll.txt)
Then run:
```bash
llc -O3 -march=amdgcn -mcpu=gfx942  -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> reduced.ll -o - 
```

# Result #

With GISel we have a `sra` and `xor` in the final assembly, whereas they could be eliminated.

With GISel:
```asm
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_and_b32_e32 v2, 0x3ff, v31
	v_ashrrev_i32_e32 v3, 31, v2
	v_xor_b32_e32 v2, v3, v2
	flat_store_dword v[0:1], v2
	s_waitcnt vmcnt(0) lgkmcnt(0)
	s_setpc_b64 s[30:31]
```

With SDISel:
```asm
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_and_b32_e32 v2, 0x3ff, v31
	flat_store_dword v[0:1], v2
	s_waitcnt vmcnt(0) lgkmcnt(0)
	s_setpc_b64 s[30:31]
```

# Note #
Input LLVM IR:
```llvm
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

declare noundef i32 @llvm.amdgcn.workgroup.id.x()

define dso_local void @foo.bb.split(ptr %out) {
newFuncRoot:
  %i = tail call i32 @llvm.amdgcn.workitem.id.x()
 %.lobit = ashr i32 %i, 31
  %i32 = xor i32 %.lobit, %i
  store i32 %i32, ptr %out
  ret void
}
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to