http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52558
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> 2012-03-12 15:55:27 UTC --- On Mon, 12 Mar 2012, amacleod at redhat dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52558 > > --- Comment #8 from Andrew Macleod <amacleod at redhat dot com> 2012-03-12 > 15:50:13 UTC --- > We can still perform store motion out of a loop, we just can't put the store > on > a path which is executed if the loop isn't executed. > > In this case, we actually made the code *slower*. Before LIM, there was a > load > of g1, a compare and return. > movl g_1(%rip), %edx > xorl %eax, %eax > testl %edx, %edx > jne .L1 > .L4: > addl $1, %eax > movl $0, g_2(%rip) > cmpl $4, %eax > jne .L4 > .L1: > rep > ret > > > LIM makes it have a load of g_1, a load of g_2 and a store to g_2 before > returning. > > .cfi_startproc > movl g_1(%rip), %edx > movl g_2(%rip), %eax > testl %edx, %edx > jne .L2 > movl $0, g_2(%rip) > ret > .L2: > movl %eax, g_2(%rip) > xorl %eax, %eax > ret > > > > -O3 corrects this mistake and returns it to the more optimal results. > > > I would argue this testcase shows LIM actually making the code worse in this > case as well. Usually loops are executed ;) At least that is what we assume if we don't know better. But yes, splitting the exit block(s) is a solution, so, if you fix this please go this way. Richard.