Apologies if this has been discussed before. I built the ARM compiler
for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance
regression. A tight loop in gcc-3.4.1 generates better code than
gcc-4.2.2.

In gcc-4.2.2, the store to the memory location of variable 'p' happens
in the loop. However, in gcc-3.4.1, 'p' is kept in a register until
after the loop when the the register is stored into the memory location
of 'p'.

Is gcc-4.2.2 being more conservative, in the possibility that p might
point to itself in the loop? 

The command I used to build was:
cc1 -O2 test.c

------------------------------------------------------------------
test.c source:

int *p;
int array[400];

main() {
  int i;
  p=array;

  for (i=0; i<400; i++) {
    *p++=0;
  }
}

------------------------------------------------------------------
Gcc-4.2.2 version
        ldr     r3, .L8
        mov     r2, #0
        str     r2, [r3], #4
        ldr     r0, .L8+4
        str     r3, [r0, #0]
        @ lr needed for prologue
        mov     r1, #1
.L2:
        ldr     r2, [r0, #0]
        mov     r3, #0
        str     r3, [r2], #4
        add     r1, r1, #1
        cmp     r1, #400
        str     r2, [r0, #0]    <==== store to 'p' inside loop
        bne     .L2
        bx      lr
.L9:
        .align  2
.L8:
        .word   array
        .word   p

------------------------------------------------------------------
Gcc-3.4.1 version
        ldr     r3, .L10
        ldr     ip, .L10+4
        str     r3, [ip, #0]
        @ lr needed for prologue
        mov     r0, #0
        mov     r1, #400
.L5:
        str     r0, [r3], #4
        subs    r1, r1, #1
        mov     r2, r3
        bne     .L5
        str     r2, [ip, #0]    <==== store to 'p' outside of loop
        mov     pc, lr
.L11:
        .align  2
.L10:
        .word   array
        .word   p


Thanks for any input you can provide.

Jeffri Tan

Reply via email to