https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94824

            Bug ID: 94824
           Summary: Failure to optimize with __builtin_bswap32 as well as
                    with a function recognized as such
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

uint32_t swap32(uint32_t x)
{
    return ((x << 24) | ((x << 8) & 0x00FF0000) | ((x >> 8) & 0x0000FF00) | (x
>> 24));
}

uint64_t swap64v1(uint64_t x)
{
    uint64_t a = __builtin_bswap32(x);
    x >>= 32;
    a <<= 32;
    return __builtin_bswap32(x) | a;
}

uint64_t swap64v2(uint64_t x)
{
    uint64_t a = swap32(x);
    x >>= 32;
    a <<= 32;
    return swap32(x) | a;
}

swap64v1 and swap64v2 are identical, since bswap32 is equivalent to
__builtin_bswap32. However, only swap64v2 is optimized to __builtin_bswap64.

swap64v1 is compiled to this by gcc -O3 :

swap64v1(unsigned long):
  mov rdx, rdi
  mov eax, edi
  shr rdx, 32
  bswap eax
  sal rax, 32
  bswap edx
  mov edi, edx
  or rax, rdi
  ret

And swap64v2 gives this :

swap64v2(unsigned long):
  mov rax, rdi
  bswap rax
  ret

Reply via email to