https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463

            Bug ID: 109463
           Summary: suboptimal sequence for converting 64-bit unsigned int
                    to float
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: elronnd at elronnd dot net
  Target Milestone: ---

double f(uint64_t x) { return x; } gives:

test   rdi,rdi
js     10 <f+0x10>
pxor   xmm0,xmm0
cvtsi2sd xmm0,rdi
ret
nop
10:
mov    rax,rdi
and    edi,0x1
pxor   xmm0,xmm0
shr    rax,1
or     rax,rdi
cvtsi2sd xmm0,rax
addsd  xmm0,xmm0
ret

In particular, the sequence:

mov    rax,rdi
and    edi,0x1
shr    rax,1
or     rax,rdi
cvtsi2sd xmm0,rax

Can be replaced with:

movzx  eax,dil
shr    rdi,1
or     rdi,rax
cvtsi2sd xmm0,rdi

Since all 9 low bits of rdi are below the sticky bit, oring them together in
any order suffices to round correctly.

Alternatively, in order to avoid clobbering rdi, use the following sequence:

mov    rax,rdi
shr    rax,1
or     al,dil
cvtsi2sd xmm0,rax

(The penalty for partial register access appears to be very cheap or
nonexistent on recent uarchs.)

Reply via email to