Jean Christophe Beyler writes:
> As we can see, all three are using the symbol_ref data before adding
> their offset. But after cse, we get this:
>
> (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
> (const:DI (plus:DI (symbol_ref:DI ("data") )
> (const_int 8 [0x8] 71 {movdi_
The subreg pass has this :
(insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 8 [0x8] 71 {movdi_internal} (nil))
(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
(symbol_ref:DI ("data") )) 71
{movdi_internal} (nil))
...
Jean Christophe Beyler writes:
> uint64_t foo (void)
> {
> return data[0] + data[1] + data[2];
> }
>
> And this generates :
>
> la r9,data
> la r7,data+8
> ldd r6,0(r7)
> ldd r8,0(r9)
> ldd r7,16(r9)
>
> I'm trying to see if there is a problem with my rtx costs function
>
Ah ok, so I can see why it would not be able to perform that
optimization around the loop but I changed the code to simply have
this:
uint64_t foo (void)
{
return data[0] + data[1] + data[2];
}
And this generates :
la r9,data
la r7,data+8
ldd r6,0(r7)
ldd r8,0(r9)
ldd r
As you can see, the compiler uses r9 to store data and then uses that
for data[0] but also loads in r7 data+8 instead of directly using r9.
If I remove the loop then it does not do this.
This optimization is done by CSE only, currently. That's why it cannot
look through loops.
Paolo
Dear all,
As some might know, I've been concentrating on optimizing the handling
of loads for my port of GCC. I'm now considering this code:
uint64_t data[107];
uint64_t foo (void)
{
uint64_t x0, x1, x2, x3, x4, x5,x6,x7;
uint64_t i;
for(i=0;i<107;i++) {
data[i] = i;
}