Ah ok, so I can see why it would not be able to perform that
optimization around the loop but I changed the code to simply have
this:

uint64_t foo (void)
{
    return data[0] + data[1] + data[2];
}

And this generates :

    la  r9,data
    la  r7,data+8
    ldd r6,0(r7)
    ldd r8,0(r9)
    ldd r7,16(r9)

I'm trying to see if there is a problem with my rtx costs function
because again, I don't understand why it would generate 2 la instead
of using an offset of 8 and 16.

Thanks for any input,
Jc

On Wed, Jul 15, 2009 at 1:29 AM, Paolo Bonzini<bonz...@gnu.org> wrote:
>
>> As you can see, the compiler uses r9 to store data and then uses that
>> for data[0] but also loads in r7 data+8 instead of directly using r9.
>> If I remove the loop then it does not do this.
>
> This optimization is done by CSE only, currently.  That's why it cannot look
> through loops.
>
> Paolo
>

Reply via email to