Hi all! I don't know exactly if I've understood all your previous explanation (excepted the load & store motion part), but we pointed out 2 different problems:
Pb n°1: depending on the optimization level -03, a[0] and a[1] are being loaded and stored on each loop iteration Pb n°2: depending on the optimization level -0s, the max range limit value (1.000.000) is loaded on each loop iteration (previously in gcc 3.4.2, it was loaded once, then the register holding it was decremented by one until null flag detected) It seems to me that your current remaks only apply to Pb n°1, am I wrong? Anyway, thanks a lot for your help! ps: my ref code when compiled with -mthumb -Os, we get: 00000000 <foo>: 0: b510 push {r4, lr} 2: 6802 ldr r2, [r0, #0] 4: 6844 ldr r4, [r0, #4] 6: 2100 movs r1, #0 8: 4b03 ldr r3, [pc, #12] (18 <.text+0x18>) a: 3101 adds r1, #1 c: 1912 adds r2, r2, r4 e: 4299 cmp r1, r3 10: d1fa bne.n 8 <foo+0x8> 12: 6002 str r2, [r0, #0] 14: bd10 pop {r4, pc} 16: 0000 lsls r0, r0, #0 18: 4240 negs r0, r0 1a: 000f lsls r7, r1, #0 Pb n°1: The Load of the loop end value is performed within the loop ! when compiled with -mthumb -O3, we get: 00000000 <foo>: 0: b530 push {r4, r5, lr} 2: 6802 ldr r2, [r0, #0] 4: 4d05 ldr r5, [pc, #20] (1c <.text+0x1c>) 6: 1d04 adds r4, r0, #4 8: 2100 movs r1, #0 a: 6823 ldr r3, [r4, #0] c: 3101 adds r1, #1 e: 18d3 adds r3, r2, r3 10: 1c1a adds r2, r3, #0 12: 6003 str r3, [r0, #0] 14: 42a9 cmp r1, r5 16: d1f8 bne.n a <foo+0xa> 18: bd30 pop {r4, r5, pc} 1a: 0000 lsls r0, r0, #0 1c: 4240 negs r0, r0 1e: 000f lsls r7, r1, #0 Mick CORNUT ----- Message d'origine ---- De : Steven Bosscher <[EMAIL PROTECTED]> À : David Edelsohn <[EMAIL PROTECTED]> Cc : Andrew Haley <[EMAIL PROTECTED]>; gcc@gcc.gnu.org Envoyé le : Vendredi, 5 Janvier 2007, 17h55mn 47s Objet : Re: gcc 3.4 > mainline performance regression On 1/5/07, David Edelsohn <[EMAIL PROTECTED]> wrote: > >>>>> Steven Bosscher writes: > > Steven> What does the code look like if you compile with -O2 -fgcse-sm? > > Yep. Mark and I recently discussed whether gcse-sm should be > enabled by default at some optimization level. We're hiding performance > from GCC users. The problem with it used to be that it was just very broken. When I fixed PR24257, it was still not possible to bootstrap with gcse store motion enabled. Putting someone on fixing tree load&store motion is probably more useful anyway, if you're going to do load&store motion for performance. In RTL, we can't move loads and stores that are not simple loads or stores (i.e. reg <- mem, or mem <- reg). There are two very popular targets where this is the common case ;-) Gr. Steven __________________________________________________ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail