[Bug target/57477] New: gcc generates suboptimal code for a simple and-shift-zeroextend combination on x86_64

mikpe at it dot uu.se Thu, 30 May 2013 12:40:24 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57477


            Bug ID: 57477
           Summary: gcc generates suboptimal code for a simple
                    and-shift-zeroextend combination on x86_64
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mikpe at it dot uu.se

Consider the following set of trivial functions:

> cat q.c
unsigned int g(unsigned int x) { return x & 0x1f; }
unsigned long f(unsigned int x) { return x << 4; }
unsigned long h(unsigned int x) { return (x & 0x1f) << 4; }

h(x) == f(g(x)).

The code generated for f and g is good (not much choice there), but the code
for h contains some (suboptimal) surprises:

> gcc -O3 -c q.c ; objdump -d q.o

q.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <g>:
   0:   89 f8                   mov    %edi,%eax
   2:   83 e0 1f                and    $0x1f,%eax
   5:   c3                      retq   
   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   d:   00 00 00 

0000000000000010 <f>:
  10:   c1 e7 04                shl    $0x4,%edi
  13:   89 f8                   mov    %edi,%eax
  15:   c3                      retq   
  16:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  1d:   00 00 00 

0000000000000020 <h>:
  20:   48 89 f8                mov    %rdi,%rax
  23:   48 c1 e0 04             shl    $0x4,%rax
  27:   25 f0 01 00 00          and    $0x1f0,%eax
  2c:   c3                      retq   

1. In h gcc exchanged the order of the '&' and the '<<', forcing it to use a
larger 4-byte immediate where g could use a 1-byte immediate, resulting in a
2-byte larger instruction encoding.

2. In h the '<<' is done in 64-bit precision, even though the input clearly is
32-bit ('unsigned int'), and the result clearly also is 32-bit (notice the
absence of a REX.W on the 'and'), resulting in 1-byte larger instruction
encoding.

3. In h the move from rdi to rax is unavoidable (due to the ABI), but it too is
redundantly done in 64-bit precision where 32-bit precision would have
sufficed, resulting in a 1-byte larger instruction encoding.

In short, h compiles to 13 bytes but could have compiled to 9 bytes.

This is with gcc 4.9, but 4.8 and 4.7 generate identical code.

[Bug target/57477] New: gcc generates suboptimal code for a simple and-shift-zeroextend combination on x86_64

Reply via email to