| Issue |
164023
|
| Summary |
Mathematical operations on constant zero are not folded away
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
mqudsi
|
As a result of a micro-optimization, we have the rust compiler generating several different variations of assertions that a particular 32-bit floating point value is zero with the hope that LLVM can optimize away a multiplication with said variable. Unfortunately, despite trying several different variations of "informing" LLVM that both the sign and shape of the f32 variable match those of `+0.0`, it doesn't seem to be able to perform this optimization.
**Test case: `+0.0` bitpattern is directly asserted via intrinsic**
```llvm
define noundef float @assert_bitpattern(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%cond = tail call i1 @llvm.is.fpclass.f32(float %y, i32 64)
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_0 = fadd float %z, %_6
ret float %_0
}
```
Result: no different than when `@llvm.assume()` isn't called, with both multiplications preserved and the addition of the multiplication by zero being included in the summation:
```asm
.LCPI0_0:
.long 0x41100000 # float 9
.LCPI0_1:
.long 0x41000000 # float 8
bitcast: # @bitcast
mulss xmm0, dword ptr [rip + .LCPI0_0]
mulss xmm1, dword ptr [rip + .LCPI0_1]
addss xmm0, xmm1
addss xmm0, xmm2
xorps xmm1, xmm1
addss xmm0, xmm1
ret
```
----
**Test case: assert the bitpattern via type punning**
```llvm
define noundef float @bitcast(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%y_as_i32 = bitcast float %y to i32
%cond = icmp eq i32 %y_as_i32, 0
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}
```
Result: same as when asserting the bitpattern via `@llvm.is.fpclass.f32(float %y, i32 64)`
----
**Test case: assert magnitude is zero (`fcmp oeq float .., 0.0000e+00`) performing an operation where the result does not change regardless of whether the variable is specifically `+0.0` or `-0.0` (because `+0.0` is added at the end of the fp32 summation)**
```llvm
define noundef float @sign_irrelevant(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%cond = fcmp oeq float %y, 0.000000e+00
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}
```
Result: no different than when `@llvm.assume()` isn't called. (Regardless of whether or not the final `fadd float %_5, 0.000e+00` is optimized away I would expect the `fmul float %y, ...` to be elided.)
----
**Test case: assert that the compiler is capable of at least folding away the operation under *any* circumstance**
```llvm
define noundef float @folded_away(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%y_ = fadd float 0.000000e+00, 0.000000e+00
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y_, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}
```
Result: here we finally observe the compiler optimizing away the multiplication and subsequent addition:
```asm
.LCPI3_0:
.long 0x41100000 # float 9
folded_away: # @folded_away
mulss xmm0, dword ptr [rip + .LCPI3_0]
xorps xmm1, xmm1
addss xmm0, xmm1
addss xmm0, xmm2
addss xmm0, xmm1
ret
```
This is the assembly I would have expected ~all the test cases above to generate.
----
LLVM version: `trunk` as well as `21.1.0` and earlier versions
Target architecture: `x86_64`
Command line flags: `-O3`
Godbolt link: https://llvm.godbolt.org/z/7W95Ehsrd
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs