Issue |
123065
|
Summary |
[AMDGPU][GISel] FMin fmax pattern not recognize
|
Labels |
llvm:globalisel
|
Assignees |
|
Reporter |
qcolombet
|
The attached reproducer lowers with compares and selects with GISel whereas SDISel uses fmin and fmax resulting in a shorter and more efficient code sequence.
SDISel seems to perform the simplification as part of its IR building process.
# To Reproduce #
Download the attached reproducer or copy/paste the IR below.
[repro.ll.txt](https://github.com/user-attachments/files/18426310/repro.ll.txt)
Then run:
```bash
llc -O3 -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> repro.ll -o -
```
# Result #
GISel:
```asm
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_and_b32_e32 v1, 1, v1
v_cmp_ne_u32_e32 vcc, 0, v1
v_mov_b32_e32 v1, 0x57f0
s_nop 0
v_cndmask_b32_e32 v0, 0, v0, vcc
v_cmp_le_f16_e32 vcc, v0, v1
s_nop 1
v_cndmask_b32_e32 v0, v1, v0, vcc
v_cvt_f32_f16_e32 v0, v0
v_cvt_i32_f32_e32 v2, v0
v_mov_b64_e32 v[0:1], 0
global_store_byte v[0:1], v2, off
s_waitcnt vmcnt(0)
s_setpc_b64 s[30:31]
```
SDISel:
```asm
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_and_b32_e32 v1, 1, v1
v_cmp_eq_u32_e32 vcc, 1, v1
s_nop 1
v_cndmask_b32_e32 v0, 0, v0, vcc
v_max_f16_e32 v0, v0, v0
v_min_f16_e32 v0, 0x57f0, v0
v_cvt_i16_f16_e32 v2, v0
v_mov_b64_e32 v[0:1], 0
global_store_byte v[0:1], v2, off
s_waitcnt vmcnt(0)
s_setpc_b64 s[30:31]
```
# Note #
Input:
```llvm
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"
define void @foo.bb848(<1 x half> %i888, <1 x i1> %0, <1 x i1> %1) {
newFuncRoot:
%i924 = select <1 x i1> %0, <1 x half> %i888, <1 x half> zeroinitializer
%.inv24 = fcmp ole <1 x half> %i924, splat (half 0xH57F0)
%i932 = select <1 x i1> %.inv24, <1 x half> %i924, <1 x half> splat (half 0xH57F0)
%i940 = fptosi <1 x half> %i932 to <1 x i8>
store <1 x i8> %i940, ptr addrspace(1) null, align 1
ret void
}
```
The problem was reduced to make it easier to debug, but the original issue was using a vector of size 4.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs