Issue 82937
Summary AMDGPU: Wrong code for fcanonicalize
Labels new issue
Assignees
Reporter hvdijk
    Please consider this minimal LLVM IR:
```llvm
define half @f(half %x) {
  %canonicalized = call half @llvm.canonicalize.f16(half %x)
  ret half %canonicalized
}
```
Run with `llc -mtriple=amdgcn` and we get:
```asm
f:                                      ; @f
 s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
        s_setpc_b64 s[30:31]
```
The `canonicalize` operation has been entirely optimised away.

The reason for this is we get during ISel:

```
  t0: ch,glue = EntryToken
          t2: f32,ch = CopyFromReg # D:1 t0, Register:f32 %0
        t4: f16 = fp_round # D:1 t2, TargetConstant:i64<1>
      t5: f16 = fcanonicalize # D:1 t4
    t6: f32 = fp_extend # D:1 t5
  t8: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t6
  t9: ch = RET_GLUE # D:1 t8, Register:f32 $vgpr0, t8:1
```

Here, `fcanonicalize` is optimised away because `SITargetLowering::isCanonicalized` determines that `fp_round` is guaranteed to return an already-canonicalised result, so no work is needed, but that then leaves us with `fp_extend (fp_round x, /*strict=*/1)` which is optimised to a no-op.

This prevents another optimisation from going in (#80520) which makes this problem show up in more cases than it currently does, and sadly I struggle to find a good way of ensuring we get correct code for this case without also making codegen for other tests worse.

@llvm/pr-subscribers-backend-amdgpu 
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to