Re: gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop

Richard Guenther Sat, 05 Apr 2008 05:03:27 -0700

On Sat, Apr 5, 2008 at 12:24 AM, Tan, Jeffri <[EMAIL PROTECTED]> wrote:
>
>  Apologies if this has been discussed before. I built the ARM compiler
>  for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance
>  regression. A tight loop in gcc-3.4.1 generates better code than
>  gcc-4.2.2.
>
>  In gcc-4.2.2, the store to the memory location of variable 'p' happens
>  in the loop. However, in gcc-3.4.1, 'p' is kept in a register until
>  after the loop when the the register is stored into the memory location
>  of 'p'.
>
>  Is gcc-4.2.2 being more conservative, in the possibility that p might
>  point to itself in the loop?


Yes, it appearantly thinks that the store to *p can clobber p.  This is
fixed with gcc 4.3.

Richard.

>  The command I used to build was:
>  cc1 -O2 test.c
>
>  ------------------------------------------------------------------
>  test.c source:
>
>  int *p;
>  int array[400];
>
>  main() {
>   int i;
>   p=array;
>
>   for (i=0; i<400; i++) {
>     *p++=0;
>   }
>  }
>
>  ------------------------------------------------------------------
>  Gcc-4.2.2 version
>         ldr     r3, .L8
>         mov     r2, #0
>         str     r2, [r3], #4
>         ldr     r0, .L8+4
>         str     r3, [r0, #0]
>         @ lr needed for prologue
>         mov     r1, #1
>  .L2:
>         ldr     r2, [r0, #0]
>         mov     r3, #0
>         str     r3, [r2], #4
>         add     r1, r1, #1
>         cmp     r1, #400
>         str     r2, [r0, #0]    <==== store to 'p' inside loop
>         bne     .L2
>         bx      lr
>  .L9:
>         .align  2
>  .L8:
>         .word   array
>         .word   p
>
>  ------------------------------------------------------------------
>  Gcc-3.4.1 version
>         ldr     r3, .L10
>         ldr     ip, .L10+4
>         str     r3, [ip, #0]
>         @ lr needed for prologue
>         mov     r0, #0
>         mov     r1, #400
>  .L5:
>         str     r0, [r3], #4
>         subs    r1, r1, #1
>         mov     r2, r3
>         bne     .L5
>         str     r2, [ip, #0]    <==== store to 'p' outside of loop
>         mov     pc, lr
>  .L11:
>         .align  2
>  .L10:
>         .word   array
>         .word   p
>
>
>  Thanks for any input you can provide.
>
>  Jeffri Tan
>

Re: gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop

Reply via email to