Compile the following code with options -march=armv7-a -mthumb -Os extern void foo(int*); void tr(int array[], int n) { int i; for (i=0; i<n; i++) foo(&array[i]); }
GCC 4.6 generates: push {r4, r5, r6, lr} mov r6, r1 mov r5, r0 movs r4, #0 b .L2 .L3: mov r0, r5 adds r4, r4, #1 bl foo adds r5, r5, #4 .L2: cmp r4, r6 blt .L3 pop {r4, r5, r6, pc} We can see that both r4 and r5 are loop induction variables, and r4 is used for loop counter only. So we can transform it to push {r4, r5, r6, lr} mov r5, r0 add r6, r5, r1 << 2 b .L2 .L3: mov r0, r5 bl foo adds r5, r5, #4 .L2: cmp r5, r6 blt .L3 pop {r4, r5, r6, pc} This new code is shorter and faster than original result, it uses one less register at the same time. Both tree-ssa and rtl loop optimizations missed this optimization. -- Summary: Missed induction variable optimization Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: carrot at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45098