64b register

zsojka at seznam dot cz Thu, 16 Jul 2009 08:33:03 -0700

For the following code:
------------------------------------------------
uint8_t data[16];


static __attribute__((noinline)) void test(unsigned i)
{
        unsigned j;
        for (j = 0; j < 16; j++)
                data[j] = ((i + j) & 0xFF00) >> 8;
}
------------------------------------------------

generated asm looks like (using -fno-tree-vectorize because of pr40771 )
# ./gcc tst2b.c -o tst2.o -O3 -march=k8 -fno-tree-vectorize
------------------------------------------------
test:
.LFB11:
        .cfi_startproc
        movq    %rdi, %rdx
        movzbl  %dh, %eax
        movb    %al, data(%rip)
        leal    1(%rdi), %eax
        movzbl  %ah, %eax
        movb    %al, data+1(%rip)
        leal    2(%rdi), %eax
        movzbl  %ah, %eax
        movb    %al, data+2(%rip)
        leal    3(%rdi), %eax
        movzbl  %ah, %eax
        movb    %al, data+3(%rip)
.....
------------------------------------------------
When "  movzbl %ah, %eax ; movb %al, data+1(%rip) " is replaced by " movb %ah,
data+1(%rip) ", code is faster. (other issue may be using lea even for
-march=pentium4 which would probably prefer add eax,1, but I can't verify that)

# ./gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --enable-languages=c,c++
--prefix=/mnt/svn/gcc-trunk/build/
Thread model: posix
gcc version 4.5.0 20090714 (experimental) (GCC)

CPU is AMD Phenom (4 cores, Barcelona) running at fixed 1400MHz.

gcc's generated code runs in 19 ticks in average, code with "movzbl ; mov al"
replaced by "mov ah" runs in 16 ticks.

Attached is whole test code.


-- 
           Summary: generating rendundant moves from second byte of 32b/64b
                    register
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: zsojka at seznam dot cz
  GCC host triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40772

[Bug rtl-optimization/40772] New: generating rendundant moves from second byte of 32b/64b register

Reply via email to