Issue 136368
Summary Inefficient codegen for `copysign(known_zero_sign_bit, x)`
Labels new issue
Assignees
Reporter dzaima
    The code:
```c
double native(double x) {
 double bound = fabs(x) > M_PI/2 ? M_PI/2 : 0;
    return copysign(bound, x);
}
```
via `-O3 -march=haswell` generates:
```asm
.LCPI0_1:
 .quad   0x3ff921fb54442d18
.LCPI0_2:
        .quad 0x7fffffffffffffff
native:
        vmovddup qword ptr [rip + .LCPI0_2] ; 0x7fffffffffffffff
        vandpd  xmm2, xmm0, xmm1
        vmovsd  xmm3, qword ptr [rip + .LCPI0_1] ; PI/2
        vcmpltsd xmm2, xmm3, xmm2
 vandpd  xmm2, xmm2, xmm3 ; xmm2 == bound
        vandnpd xmm0, xmm1, xmm0
 vandpd  xmm1, xmm2, xmm1 ; unnecessary! could be just xmm2
 vorpd   xmm0, xmm1, xmm0
        ret
```
which has an extraneous `vandpd` masking out the sign bit of `bound`, even though that's always 0. Moreover, manually doing the more efficient bitwise arith still results in the suboptimal code.

The better assembly would be:
```asm
        vandpd xmm1, xmm0, xmmword ptr [rip + .LCPI2_0] ; extract sign
        vandpd xmm0, xmm0, xmmword ptr [rip + .LCPI2_1] ; mask out sign
        vmovsd xmm2, qword ptr [rip + .LCPI2_2] ; PI/2
        vcmpltsd xmm0, xmm2, xmm0
 vandpd  xmm0, xmm0, xmm2
        vorpd   xmm0, xmm1, xmm0
 ret
```

Compiler explorer link, plus the manual impl: https://godbolt.org/z/dv14nn39x (as an aside, there's an easily-avoidable `vmovq xmm1, xmm0` there; perhaps from the inline assembly workaround messing with register allocation?)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to