Issue |
136368
|
Summary |
Inefficient codegen for `copysign(known_zero_sign_bit, x)`
|
Labels |
new issue
|
Assignees |
|
Reporter |
dzaima
|
The code:
```c
double native(double x) {
double bound = fabs(x) > M_PI/2 ? M_PI/2 : 0;
return copysign(bound, x);
}
```
via `-O3 -march=haswell` generates:
```asm
.LCPI0_1:
.quad 0x3ff921fb54442d18
.LCPI0_2:
.quad 0x7fffffffffffffff
native:
vmovddup qword ptr [rip + .LCPI0_2] ; 0x7fffffffffffffff
vandpd xmm2, xmm0, xmm1
vmovsd xmm3, qword ptr [rip + .LCPI0_1] ; PI/2
vcmpltsd xmm2, xmm3, xmm2
vandpd xmm2, xmm2, xmm3 ; xmm2 == bound
vandnpd xmm0, xmm1, xmm0
vandpd xmm1, xmm2, xmm1 ; unnecessary! could be just xmm2
vorpd xmm0, xmm1, xmm0
ret
```
which has an extraneous `vandpd` masking out the sign bit of `bound`, even though that's always 0. Moreover, manually doing the more efficient bitwise arith still results in the suboptimal code.
The better assembly would be:
```asm
vandpd xmm1, xmm0, xmmword ptr [rip + .LCPI2_0] ; extract sign
vandpd xmm0, xmm0, xmmword ptr [rip + .LCPI2_1] ; mask out sign
vmovsd xmm2, qword ptr [rip + .LCPI2_2] ; PI/2
vcmpltsd xmm0, xmm2, xmm0
vandpd xmm0, xmm0, xmm2
vorpd xmm0, xmm1, xmm0
ret
```
Compiler explorer link, plus the manual impl: https://godbolt.org/z/dv14nn39x (as an aside, there's an easily-avoidable `vmovq xmm1, xmm0` there; perhaps from the inline assembly workaround messing with register allocation?)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs