https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Here is some analysis on why we miss some of these opportunities in ldp_fusion.
So initially in 267r.vregs we have some very clean RTL:
6: r101:DI=sfp:DI-0x40
7: x0:DI=r101:DI
8: call [`g'] argc:0
REG_CALL_DECL `g'
9: r102:DI=sfp:DI-0x80
10: r103:DI=sfp:DI-0x40
11: r104:V4SI=[r103:DI]
13: r105:V4SI=[r103:DI+0x10]
15: r106:V4SI=[r103:DI+0x20]
17: r107:V4SI=[r103:DI+0x30]
12: [r102:DI]=r104:V4SI
14: [r102:DI+0x10]=r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
if were to run the ldp/stp pass on this it should form the pairs without a
problem. Of course things go downhill from here. The first slightly strange
thing is that fwprop propagates the sfp into the first of each group of
accesses (i.e. with offset 0), but not the others:
9: r102:DI=sfp:DI-0x80
11: r104:V4SI=[sfp:DI-0x40]
13: r105:V4SI=[r101:DI+0x10]
15: r106:V4SI=[r101:DI+0x20]
17: r107:V4SI=[r101:DI+0x30]
REG_DEAD r103:DI
12: [sfp:DI-0x80]=r104:V4SI
14: [r102:DI+0x10]=r105:V4SI
REG_DEAD r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
REG_DEAD r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
the RTL then stays mostly unchanged until sched1, where things really start to
go downhill:
11: r104:V4SI=[sfp:DI-0x40]
9: r102:DI=sfp:DI-0x80
13: r105:V4SI=[r101:DI+0x10]
20: x0:DI=r102:DI
REG_DEAD r102:DI
REG_EQUAL sfp:DI-0x80
15: r106:V4SI=[r101:DI+0x20]
12: [sfp:DI-0x80]=r104:V4SI
REG_DEAD r104:V4SI
17: r107:V4SI=[r101:DI+0x30]
REG_DEAD r101:DI
14: [r102:DI+0x10]=r105:V4SI
REG_DEAD r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
REG_DEAD r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
here the first of the stores (i12) has been moved up between the last pair of
loads (i15, i17). Now the interesting thing is how sched1 knows that it is
safe to perform this transformation. In the ldp_fusion1 pass we miss this pair
because we think that the loads may alias with i12:
cannot form pair (15,17) due to alias conflicts (12,12)
so it would be good to look into how our alias analysis differs from what
sched1 is doing. It's worth further noting that while the loads have MEM_EXPR
information (they point to the var_decl for s) the stores do not. Presumably
this is because the copy of s mandated by the ABI doesn't necessarily have a
tree decl representation that the MEM_EXPRs could point to.
Separately to the aliasing issue, because:
- there is no MEM_EXPR information for the stores, and
- forwprop1 substituted the sfp in for the first store
ldp_fusion fails to discover the (i12,i14) store pair opportunity. As a result
we unfortunately end up forming an stp in the middle.
Interestingly if I turn off fwprop1 then we still fail to form the
(12,14) pair due to aliasing.
So it seems the main thing to investigate is how sched1 does its alias
analysis and how that differs from what we're doing in ldp_fusion.
I have some WIP patches that should improve the pair discovery and could
potentially be extended to help with the case of the (12,14) pair here.
Another thing that could help with that is if we populated the MEM_EXPR for the
stores of the structure copy.