https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #24) > (In reply to rguent...@suse.de from comment #23) > > > Am 23.11.2024 um 13:20 schrieb tnfchris at gcc dot gnu.org > > > <gcc-bugzi...@gcc.gnu.org>: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463 > > > > > > --- Comment #22 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > > > Ok, so the problem with the ones on trunk isn't necessarily the > > > canonicalization itself but that our externals handling is a bit shallow. > > > > > > On externals we determine that we have no information on the DF and > > > return TOP. > > > This is because DR analysis doesn't try to handle externals since they're > > > not > > > part of the loop. > > > > > > However all we need to know for complex numbers is whether the externals > > > are > > > loaded from the same place and the order of them. > > > > > > concretely the loop pre-header is: > > > > > > <bb 2> [local count: 10737416]: > > > b$real_11 = REALPART_EXPR <b_15(D)>; > > > b$imag_10 = IMAGPART_EXPR <b_15(D)>; > > > _53 = -b$imag_10; > > > > > > and the loop body: > > > > > > <bb 3> [local count: 1063004408]: > > > ... > > > > > > _23 = REALPART_EXPR <*_5>; > > > _24 = IMAGPART_EXPR <*_5>; > > > _27 = _24 * _53; > > > _28 = _23 * _53; > > > > > > codegen before after: > > > > > > {_24, _23} * { _53, _53 } > > > > > > and after > > > > > > { _24, _24 } * { _53, b$real_11 } > > > > > > Before we were able to easily tell that the order for the multiply would > > > be > > > IMAG, REAL. > > > In the after (GCC 15) case that information is there, but requires us to > > > follow > > > the externals. > > > > > > Richi what do you think about extending externals handling in > > > linear_loads_p to > > > follow all external ops and if they load from the same memref to figure > > > out the > > > "virtual lane permute"? > > > > Externs do not have a permute as we build them from scalars. So any permute > > can be trivially imposed on them - rather than TOP they should be BOTTOM. > > Of course there’s also no advantage of imposing a permute on them. > > > > But the scalars can access memory that we can tell what they are. > > My point with the above was that it doesn't make sense to me that we know > that {a[0],a[1]} reads a linearly but that with > > a1 = a[0] > a2 = a[1] > > {a1,a2} we say "sorry we know nothing about you". > > Yes they're externals but they have a defined order of use in the SLP tree. > This isn't about imposing a permute. I said virtual permute since > linear_load_p uses the lane permutes on loads to determine the memory access > order. > > We DO already impose any order on them, but the other operand is oddodd, so > the overall order ends up being oddodd because any known permute overrides > unknown ones. So what's the desired outcome? I guess PERM_UNKNOWN? I guess it's the "other operand" of an add? What's the (bad) effect of classifying it as ODDODD (optimistically)? > So the question is, can we not follow externals in a constructor to figure > out if how they are used they all read from the same base and in which order? I don't see how it makes sense to do this. For the above example, what's the testcase exhibiting this (and on which arch)?