https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> --- Here is some analysis on why we miss some of these opportunities in ldp_fusion. So initially in 267r.vregs we have some very clean RTL: 6: r101:DI=sfp:DI-0x40 7: x0:DI=r101:DI 8: call [`g'] argc:0 REG_CALL_DECL `g' 9: r102:DI=sfp:DI-0x80 10: r103:DI=sfp:DI-0x40 11: r104:V4SI=[r103:DI] 13: r105:V4SI=[r103:DI+0x10] 15: r106:V4SI=[r103:DI+0x20] 17: r107:V4SI=[r103:DI+0x30] 12: [r102:DI]=r104:V4SI 14: [r102:DI+0x10]=r105:V4SI 16: [r102:DI+0x20]=r106:V4SI 18: [r102:DI+0x30]=r107:V4SI if were to run the ldp/stp pass on this it should form the pairs without a problem. Of course things go downhill from here. The first slightly strange thing is that fwprop propagates the sfp into the first of each group of accesses (i.e. with offset 0), but not the others: 9: r102:DI=sfp:DI-0x80 11: r104:V4SI=[sfp:DI-0x40] 13: r105:V4SI=[r101:DI+0x10] 15: r106:V4SI=[r101:DI+0x20] 17: r107:V4SI=[r101:DI+0x30] REG_DEAD r103:DI 12: [sfp:DI-0x80]=r104:V4SI 14: [r102:DI+0x10]=r105:V4SI REG_DEAD r105:V4SI 16: [r102:DI+0x20]=r106:V4SI REG_DEAD r106:V4SI 18: [r102:DI+0x30]=r107:V4SI the RTL then stays mostly unchanged until sched1, where things really start to go downhill: 11: r104:V4SI=[sfp:DI-0x40] 9: r102:DI=sfp:DI-0x80 13: r105:V4SI=[r101:DI+0x10] 20: x0:DI=r102:DI REG_DEAD r102:DI REG_EQUAL sfp:DI-0x80 15: r106:V4SI=[r101:DI+0x20] 12: [sfp:DI-0x80]=r104:V4SI REG_DEAD r104:V4SI 17: r107:V4SI=[r101:DI+0x30] REG_DEAD r101:DI 14: [r102:DI+0x10]=r105:V4SI REG_DEAD r105:V4SI 16: [r102:DI+0x20]=r106:V4SI REG_DEAD r106:V4SI 18: [r102:DI+0x30]=r107:V4SI here the first of the stores (i12) has been moved up between the last pair of loads (i15, i17). Now the interesting thing is how sched1 knows that it is safe to perform this transformation. In the ldp_fusion1 pass we miss this pair because we think that the loads may alias with i12: cannot form pair (15,17) due to alias conflicts (12,12) so it would be good to look into how our alias analysis differs from what sched1 is doing. It's worth further noting that while the loads have MEM_EXPR information (they point to the var_decl for s) the stores do not. Presumably this is because the copy of s mandated by the ABI doesn't necessarily have a tree decl representation that the MEM_EXPRs could point to. Separately to the aliasing issue, because: - there is no MEM_EXPR information for the stores, and - forwprop1 substituted the sfp in for the first store ldp_fusion fails to discover the (i12,i14) store pair opportunity. As a result we unfortunately end up forming an stp in the middle. Interestingly if I turn off fwprop1 then we still fail to form the (12,14) pair due to aliasing. So it seems the main thing to investigate is how sched1 does its alias analysis and how that differs from what we're doing in ldp_fusion. I have some WIP patches that should improve the pair discovery and could potentially be extended to help with the case of the (12,14) pair here. Another thing that could help with that is if we populated the MEM_EXPR for the stores of the structure copy.