On Sat, Apr 5, 2008 at 12:24 AM, Tan, Jeffri <[EMAIL PROTECTED]> wrote: > > Apologies if this has been discussed before. I built the ARM compiler > for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance > regression. A tight loop in gcc-3.4.1 generates better code than > gcc-4.2.2. > > In gcc-4.2.2, the store to the memory location of variable 'p' happens > in the loop. However, in gcc-3.4.1, 'p' is kept in a register until > after the loop when the the register is stored into the memory location > of 'p'. > > Is gcc-4.2.2 being more conservative, in the possibility that p might > point to itself in the loop?
Yes, it appearantly thinks that the store to *p can clobber p. This is fixed with gcc 4.3. Richard. > The command I used to build was: > cc1 -O2 test.c > > ------------------------------------------------------------------ > test.c source: > > int *p; > int array[400]; > > main() { > int i; > p=array; > > for (i=0; i<400; i++) { > *p++=0; > } > } > > ------------------------------------------------------------------ > Gcc-4.2.2 version > ldr r3, .L8 > mov r2, #0 > str r2, [r3], #4 > ldr r0, .L8+4 > str r3, [r0, #0] > @ lr needed for prologue > mov r1, #1 > .L2: > ldr r2, [r0, #0] > mov r3, #0 > str r3, [r2], #4 > add r1, r1, #1 > cmp r1, #400 > str r2, [r0, #0] <==== store to 'p' inside loop > bne .L2 > bx lr > .L9: > .align 2 > .L8: > .word array > .word p > > ------------------------------------------------------------------ > Gcc-3.4.1 version > ldr r3, .L10 > ldr ip, .L10+4 > str r3, [ip, #0] > @ lr needed for prologue > mov r0, #0 > mov r1, #400 > .L5: > str r0, [r3], #4 > subs r1, r1, #1 > mov r2, r3 > bne .L5 > str r2, [ip, #0] <==== store to 'p' outside of loop > mov pc, lr > .L11: > .align 2 > .L10: > .word array > .word p > > > Thanks for any input you can provide. > > Jeffri Tan >