https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94824
Bug ID: 94824 Summary: Failure to optimize with __builtin_bswap32 as well as with a function recognized as such Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- uint32_t swap32(uint32_t x) { return ((x << 24) | ((x << 8) & 0x00FF0000) | ((x >> 8) & 0x0000FF00) | (x >> 24)); } uint64_t swap64v1(uint64_t x) { uint64_t a = __builtin_bswap32(x); x >>= 32; a <<= 32; return __builtin_bswap32(x) | a; } uint64_t swap64v2(uint64_t x) { uint64_t a = swap32(x); x >>= 32; a <<= 32; return swap32(x) | a; } swap64v1 and swap64v2 are identical, since bswap32 is equivalent to __builtin_bswap32. However, only swap64v2 is optimized to __builtin_bswap64. swap64v1 is compiled to this by gcc -O3 : swap64v1(unsigned long): mov rdx, rdi mov eax, edi shr rdx, 32 bswap eax sal rax, 32 bswap edx mov edi, edx or rax, rdi ret And swap64v2 gives this : swap64v2(unsigned long): mov rax, rdi bswap rax ret