Ah ok, so I can see why it would not be able to perform that optimization around the loop but I changed the code to simply have this:
uint64_t foo (void) { return data[0] + data[1] + data[2]; } And this generates : la r9,data la r7,data+8 ldd r6,0(r7) ldd r8,0(r9) ldd r7,16(r9) I'm trying to see if there is a problem with my rtx costs function because again, I don't understand why it would generate 2 la instead of using an offset of 8 and 16. Thanks for any input, Jc On Wed, Jul 15, 2009 at 1:29 AM, Paolo Bonzini<bonz...@gnu.org> wrote: > >> As you can see, the compiler uses r9 to store data and then uses that >> for data[0] but also loads in r7 data+8 instead of directly using r9. >> If I remove the loop then it does not do this. > > This optimization is done by CSE only, currently. That's why it cannot look > through loops. > > Paolo >