Re : gcc 3.4 > mainline performance regression

Mick CORNUT Fri, 05 Jan 2007 10:23:00 -0800

Hi all!
I don't know exactly if I've understood all your previous explanation (excepted 
the load & store motion part), but we pointed out 2 different problems:


Pb n°1: depending on the optimization level -03, a[0] and a[1] are being loaded 
and stored on each loop iteration
Pb n°2: depending on the optimization level -0s, the max range limit value 
(1.000.000) is loaded on each loop iteration (previously in gcc 3.4.2, it was 
loaded once, then the register holding it was decremented by one until null 
flag detected)

It seems to me that your current remaks only apply to Pb n°1, am I wrong?

Anyway, thanks a lot for your help!

ps: my ref code
when compiled with -mthumb -Os, we get:
00000000 <foo>:
  0:    b510          push    {r4, lr}
  2:    6802          ldr    r2, [r0, #0]
  4:    6844          ldr    r4, [r0, #4]
  6:    2100          movs    r1, #0
  8:    4b03          ldr    r3, [pc, #12]    (18 <.text+0x18>)
  a:    3101          adds    r1, #1
  c:    1912          adds    r2, r2, r4
  e:    4299          cmp    r1, r3
 10:    d1fa          bne.n    8 <foo+0x8>
 12:    6002          str    r2, [r0, #0]
 14:    bd10          pop    {r4, pc}
 16:    0000          lsls    r0, r0, #0
 18:    4240          negs    r0, r0
 1a:    000f          lsls    r7, r1, #0

Pb n°1: The Load of the loop end value is performed within the loop !

   when compiled with -mthumb -O3, we get:
00000000 <foo>:
  0:    b530          push    {r4, r5, lr}
  2:    6802          ldr    r2, [r0, #0]
  4:    4d05          ldr    r5, [pc, #20]    (1c <.text+0x1c>)
  6:    1d04          adds    r4, r0, #4
  8:    2100          movs    r1, #0
  a:    6823          ldr    r3, [r4, #0]
  c:    3101          adds    r1, #1
  e:    18d3          adds    r3, r2, r3
 10:    1c1a          adds    r2, r3, #0
 12:    6003          str    r3, [r0, #0]
 14:    42a9          cmp    r1, r5
 16:    d1f8          bne.n    a <foo+0xa>
 18:    bd30          pop    {r4, r5, pc}
 1a:    0000          lsls    r0, r0, #0
 1c:    4240          negs    r0, r0
 1e:    000f          lsls    r7, r1, #0


Mick CORNUT


----- Message d'origine ----
De : Steven Bosscher <[EMAIL PROTECTED]>
À : David Edelsohn <[EMAIL PROTECTED]>
Cc : Andrew Haley <[EMAIL PROTECTED]>; gcc@gcc.gnu.org
Envoyé le : Vendredi, 5 Janvier 2007, 17h55mn 47s
Objet : Re: gcc 3.4 > mainline performance regression

On 1/5/07, David Edelsohn <[EMAIL PROTECTED]> wrote:
> >>>>> Steven Bosscher writes:
>
> Steven> What does the code look like if you compile with -O2  -fgcse-sm?
>
>         Yep.  Mark and I recently discussed whether gcse-sm should be
> enabled by default at some optimization level.  We're hiding performance
> from GCC users.

The problem with it used to be that it was just very broken. When I
fixed PR24257, it was still not possible to bootstrap with gcse store
motion enabled.

Putting someone on fixing tree load&store motion is probably more
useful anyway, if you're going to do load&store motion for
performance.  In RTL, we can't move loads and stores that are not
simple loads or stores (i.e. reg <- mem, or mem <- reg). There are two
very popular targets where this is the common case ;-)

Gr.
Steven




__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail

Re : gcc 3.4 > mainline performance regression

Reply via email to