https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101
--- Comment #30 from Chris Hall <gcc at gmch dot uk> --- godbolt shows that gcc v9.1 -O3 generates: 0000: 8d 04 3f lea (%rdi,%rdi,1),%eax 0003: d1 ff sar $1,%edi 0005: 48 98 cltq 0007: 48 63 d7 movslq %edi,%rdx 000A: c3 ret 000B: The last earlier version available on godbolt is v8.5, which generates 47 bytes of code shuffling stuff to and from %xmm0. gcc v13.3 generates the same code as v9.1. I haven't tried all the intervening versions, but all the ones I did try also gave the same. BUT, v14.1 generates: 0000: 8d 14 3f lea (%rdi,%rdi,1),%edx 0003: d1 ff sar $1,%edi 0005: 48 63 c7 movslq %edi,%rax 0008: 48 63 d2 movslq %edx,%rdx 000B: 48 92 xchg %rax,%rdx 000D: c3 ret 000E: The difference is clearly trivial (1 extra instruction and 3 extra bytes)... but I don't see why it would choose %edx instead of %eax for the first operation ?