http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57477
Bug ID: 57477 Summary: gcc generates suboptimal code for a simple and-shift-zeroextend combination on x86_64 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mikpe at it dot uu.se Consider the following set of trivial functions: > cat q.c unsigned int g(unsigned int x) { return x & 0x1f; } unsigned long f(unsigned int x) { return x << 4; } unsigned long h(unsigned int x) { return (x & 0x1f) << 4; } h(x) == f(g(x)). The code generated for f and g is good (not much choice there), but the code for h contains some (suboptimal) surprises: > gcc -O3 -c q.c ; objdump -d q.o q.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <g>: 0: 89 f8 mov %edi,%eax 2: 83 e0 1f and $0x1f,%eax 5: c3 retq 6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) d: 00 00 00 0000000000000010 <f>: 10: c1 e7 04 shl $0x4,%edi 13: 89 f8 mov %edi,%eax 15: c3 retq 16: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 1d: 00 00 00 0000000000000020 <h>: 20: 48 89 f8 mov %rdi,%rax 23: 48 c1 e0 04 shl $0x4,%rax 27: 25 f0 01 00 00 and $0x1f0,%eax 2c: c3 retq 1. In h gcc exchanged the order of the '&' and the '<<', forcing it to use a larger 4-byte immediate where g could use a 1-byte immediate, resulting in a 2-byte larger instruction encoding. 2. In h the '<<' is done in 64-bit precision, even though the input clearly is 32-bit ('unsigned int'), and the result clearly also is 32-bit (notice the absence of a REX.W on the 'and'), resulting in 1-byte larger instruction encoding. 3. In h the move from rdi to rax is unavoidable (due to the ABI), but it too is redundantly done in 64-bit precision where 32-bit precision would have sufficed, resulting in a 1-byte larger instruction encoding. In short, h compiles to 13 bytes but could have compiled to 9 bytes. This is with gcc 4.9, but 4.8 and 4.7 generate identical code.