Issue |
136727
|
Summary |
AMDGPU misses optimization on check-all-workitem-ids are 0 pattern
|
Labels |
backend:AMDGPU,
missed-optimization
|
Assignees |
|
Reporter |
arsenm
|
The device libraries include this pattern to check if all workitem IDs are 0.
```
// RUN: clang -target amdgcn-amd-amdhsa -S -O3 -mcpu=gfx900 -nogpulib < %s
bool
choose_one_workgroup_workitem(void)
{
return (__builtin_amdgcn_workitem_id_x() | __builtin_amdgcn_workitem_id_y() | __builtin_amdgcn_workitem_id_z()) == 0;
}
```
https://github.com/ROCm/llvm-project/blob/662bae8d56ae5ba900a81b468936f47769b0fc2d/amd/device-libs/ockl/src/cg.cl#L46
This is equivalent to checking x == 0 && y == 0 && z == 0. If we codegen this, we see:
```
v_and_b32_e32 v0, 0x3ff, v31
v_bfe_u32 v1, v31, 20, 10
v_bfe_u32 v2, v31, 10, 10
v_or3_b32 v0, v0, v2, v1
v_cmp_eq_u32_e32 vcc, 0, v0
v_cndmask_b32_e64 v0, 0, 1, vcc
s_setpc_b64 s[30:31]
```
In the function ABI, the work item IDs are packed into v31. We should be able to just check v31 == 0, so this would shrink to
```
v_cmp_eq_u32_e32 vcc, 0, v31
v_cndmask_b32_e64 v0, 0, 1, vcc
s_setpc_b64 s[30:31]
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs