Issue |
82937
|
Summary |
AMDGPU: Wrong code for fcanonicalize
|
Labels |
new issue
|
Assignees |
|
Reporter |
hvdijk
|
Please consider this minimal LLVM IR:
```llvm
define half @f(half %x) {
%canonicalized = call half @llvm.canonicalize.f16(half %x)
ret half %canonicalized
}
```
Run with `llc -mtriple=amdgcn` and we get:
```asm
f: ; @f
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
```
The `canonicalize` operation has been entirely optimised away.
The reason for this is we get during ISel:
```
t0: ch,glue = EntryToken
t2: f32,ch = CopyFromReg # D:1 t0, Register:f32 %0
t4: f16 = fp_round # D:1 t2, TargetConstant:i64<1>
t5: f16 = fcanonicalize # D:1 t4
t6: f32 = fp_extend # D:1 t5
t8: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t6
t9: ch = RET_GLUE # D:1 t8, Register:f32 $vgpr0, t8:1
```
Here, `fcanonicalize` is optimised away because `SITargetLowering::isCanonicalized` determines that `fp_round` is guaranteed to return an already-canonicalised result, so no work is needed, but that then leaves us with `fp_extend (fp_round x, /*strict=*/1)` which is optimised to a no-op.
This prevents another optimisation from going in (#80520) which makes this problem show up in more cases than it currently does, and sadly I struggle to find a good way of ensuring we get correct code for this case without also making codegen for other tests worse.
@llvm/pr-subscribers-backend-amdgpu
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs