https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463
Bug ID: 109463 Summary: suboptimal sequence for converting 64-bit unsigned int to float Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: elronnd at elronnd dot net Target Milestone: --- double f(uint64_t x) { return x; } gives: test rdi,rdi js 10 <f+0x10> pxor xmm0,xmm0 cvtsi2sd xmm0,rdi ret nop 10: mov rax,rdi and edi,0x1 pxor xmm0,xmm0 shr rax,1 or rax,rdi cvtsi2sd xmm0,rax addsd xmm0,xmm0 ret In particular, the sequence: mov rax,rdi and edi,0x1 shr rax,1 or rax,rdi cvtsi2sd xmm0,rax Can be replaced with: movzx eax,dil shr rdi,1 or rdi,rax cvtsi2sd xmm0,rdi Since all 9 low bits of rdi are below the sticky bit, oring them together in any order suffices to round correctly. Alternatively, in order to avoid clobbering rdi, use the following sequence: mov rax,rdi shr rax,1 or al,dil cvtsi2sd xmm0,rax (The penalty for partial register access appears to be very cheap or nonexistent on recent uarchs.)