http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52558
--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> 2012-03-12 15:50:13 UTC --- We can still perform store motion out of a loop, we just can't put the store on a path which is executed if the loop isn't executed. In this case, we actually made the code *slower*. Before LIM, there was a load of g1, a compare and return. movl g_1(%rip), %edx xorl %eax, %eax testl %edx, %edx jne .L1 .L4: addl $1, %eax movl $0, g_2(%rip) cmpl $4, %eax jne .L4 .L1: rep ret LIM makes it have a load of g_1, a load of g_2 and a store to g_2 before returning. .cfi_startproc movl g_1(%rip), %edx movl g_2(%rip), %eax testl %edx, %edx jne .L2 movl $0, g_2(%rip) ret .L2: movl %eax, g_2(%rip) xorl %eax, %eax ret -O3 corrects this mistake and returns it to the more optimal results. I would argue this testcase shows LIM actually making the code worse in this case as well.