Apologies if this has been discussed before. I built the ARM compiler for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance regression. A tight loop in gcc-3.4.1 generates better code than gcc-4.2.2.
In gcc-4.2.2, the store to the memory location of variable 'p' happens in the loop. However, in gcc-3.4.1, 'p' is kept in a register until after the loop when the the register is stored into the memory location of 'p'. Is gcc-4.2.2 being more conservative, in the possibility that p might point to itself in the loop? The command I used to build was: cc1 -O2 test.c ------------------------------------------------------------------ test.c source: int *p; int array[400]; main() { int i; p=array; for (i=0; i<400; i++) { *p++=0; } } ------------------------------------------------------------------ Gcc-4.2.2 version ldr r3, .L8 mov r2, #0 str r2, [r3], #4 ldr r0, .L8+4 str r3, [r0, #0] @ lr needed for prologue mov r1, #1 .L2: ldr r2, [r0, #0] mov r3, #0 str r3, [r2], #4 add r1, r1, #1 cmp r1, #400 str r2, [r0, #0] <==== store to 'p' inside loop bne .L2 bx lr .L9: .align 2 .L8: .word array .word p ------------------------------------------------------------------ Gcc-3.4.1 version ldr r3, .L10 ldr ip, .L10+4 str r3, [ip, #0] @ lr needed for prologue mov r0, #0 mov r1, #400 .L5: str r0, [r3], #4 subs r1, r1, #1 mov r2, r3 bne .L5 str r2, [ip, #0] <==== store to 'p' outside of loop mov pc, lr .L11: .align 2 .L10: .word array .word p Thanks for any input you can provide. Jeffri Tan