Poor code generation/optimisation in all versions of GCC x86-64 and x86-32

Stefan Kanthak Mon, 05 Nov 2018 08:04:00 -0800

Hi @ll,

the following code snippets let GCC x86-64 generate rather
poor code (see <https://godbolt.org/z/hDj0W5>):


__int128 foo(__int128 n)
{
    n <<= 1;
    n += 1;
    n |= 1;
    return n;
}


__int128 bar(__int128 n)
{
    n += n;
    n += 1;
    n |= 1;
    return n;
}

With -O1:                    With -O2:

foo:                         foo:
    mov   rax, rdi               mov   rax, rdi
    mov   rdx, rsi               mov   rdx, rsi
    shld  rdx, rdi, 1            add   rax, rdi
    add   rax, rdi               shld  rdx, rdi, 1
    add   rax, 1                 add   rax, 1
    adc   rdx, 0                 adc   rdx, 0
    mov   rcx, rdx               or    rax, 1
    or    rax, 1                 mov   rcx, rdx
    mov   rdx, rcx               mov   rdx, rcx
    ret                          ret
bar:                         bar:
    mov   rax, rdi               mov   rax, rdi
    mov   rdx, rsi               mov   rdx, rsi
    shld  rdx, rdi, 1            add   rax, rdi
    add   rax, rdi               shld  rdx, rdi, 1
    add   rax, 1                 add   rax, 1
    adc   rdx, 0                 adc   rdx, 0
    mov   rcx, rdx               or    rax, 1
    or    rax, 1                 mov   rcx, rdx
    mov   rdx, rcx               mov   rdx, rcx
    ret                          ret

In all 4 examples, the last 4 instructions are superfluous!

1. the optimizer should be aware that both "n <<= 1;" and "n += n;"
   yield an even number, and thus a following "n += 1;" can't
   generate a carry.
   More general: after "n <<= x;" addition of any value less than
   1 << x can't generate a carry.

2. the optimzier should also be aware that addition of 1 to an even
   number yields an odd number, and thus a following "n |= 1;" is
   a no-op.
   More general: after "n <<= x;", the addition of any value less
   than 1 << x is equal to a logical or with the same value.

3. last, it should never create "mov rcx, rdx" followed by an
   inverse "mov rdx, rcx".

I also wonder why a shld is created here: at least for "n += n;"
I expect a more straightforward
    add  rax, rax
    adc  rdx, rdx

regards
Stefan Kanthak

PS: of course GCC x86-32 exhibits the same flaws with int64_t!

Poor code generation/optimisation in all versions of GCC x86-64 and x86-32

Reply via email to