http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58112

            Bug ID: 58112
           Summary: Ineffective addressing mode used in loop.
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: neleai at seznam dot cz

Hi, in following testcase gcc -O3 generates following loop:

        movq    %rsi, %r9
        subq    %rdx, %r9
        movq    %r9, %rdi
        movq    %r9, %rsi
        leaq    16(%r9), %r8
        addq    $32, %rdi
        addq    $48, %rsi
        .p2align 4,,10
        .p2align 3
.L14:
        movdqu  (%rdx,%r9), %xmm0
        addq    $64, %rdx
        movdqa  %xmm0, -64(%rdx)
        movdqu  -64(%rdx,%r8), %xmm0
        movdqa  %xmm0, -48(%rdx)
        movdqu  -64(%rdx,%rdi), %xmm0
        movdqa  %xmm0, -32(%rdx)
        movdqu  -64(%rdx,%rsi), %xmm0
        movdqa  %xmm0, -16(%rdx)
        cmpq    %rdx, %rcx
        jne     .L14
        rep; ret

It saves one addq $64, %rsi instruction. However it occupies four extra
registers, and address calculations done at each iteration cost more and lead
to bigger code than instruction saved.

Reply via email to