Compile the following code with options -march=armv7-a -mthumb -Os

extern void foo(int*);
void tr(int array[], int n)
{
  int i;
  for (i=0; i<n; i++)
    foo(&array[i]);
}

GCC 4.6 generates:

        push    {r4, r5, r6, lr}
        mov     r6, r1
        mov     r5, r0
        movs    r4, #0
        b       .L2
.L3:
        mov     r0, r5
        adds    r4, r4, #1
        bl      foo
        adds    r5, r5, #4
.L2:
        cmp     r4, r6
        blt     .L3
        pop     {r4, r5, r6, pc}

We can see that both r4 and r5 are loop induction variables, and r4 is used for
loop counter only. So we can transform it to

        push    {r4, r5, r6, lr}
        mov     r5, r0
        add     r6, r5, r1 << 2
        b       .L2
.L3:
        mov     r0, r5
        bl      foo
        adds    r5, r5, #4
.L2:
        cmp     r5, r6
        blt     .L3
        pop     {r4, r5, r6, pc}

This new code is shorter and faster than original result, it uses one less
register at the same time.

Both tree-ssa and rtl loop optimizations missed this optimization.


-- 
           Summary: Missed induction variable optimization
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: carrot at google dot com
 GCC build triplet: i686-linux
  GCC host triplet: i686-linux
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45098

Reply via email to