[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 25 Nov 2024 01:12:12 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463


--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #24)
> (In reply to rguent...@suse.de from comment #23)
> > > Am 23.11.2024 um 13:20 schrieb tnfchris at gcc dot gnu.org 
> > > <gcc-bugzi...@gcc.gnu.org>:
> > > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
> > > 
> > > --- Comment #22 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > > Ok, so the problem with the ones on trunk isn't necessarily the
> > > canonicalization itself but that our externals handling is a bit shallow.
> > > 
> > > On externals we determine that we have no information on the DF and 
> > > return TOP.
> > > This is because DR analysis doesn't try to handle externals since they're 
> > > not
> > > part of the loop.
> > > 
> > > However all we need to know for complex numbers is whether the externals 
> > > are
> > > loaded from the same place and the order of them.
> > > 
> > > concretely the loop pre-header is:
> > > 
> > >  <bb 2> [local count: 10737416]:
> > >  b$real_11 = REALPART_EXPR <b_15(D)>;
> > >  b$imag_10 = IMAGPART_EXPR <b_15(D)>;
> > >  _53 = -b$imag_10;
> > > 
> > > and the loop body:
> > > 
> > >  <bb 3> [local count: 1063004408]:
> > >  ...
> > > 
> > >  _23 = REALPART_EXPR <*_5>;
> > >  _24 = IMAGPART_EXPR <*_5>;
> > >  _27 = _24 * _53;
> > >  _28 = _23 * _53;
> > > 
> > > codegen before after:
> > > 
> > > {_24, _23} * { _53, _53 }
> > > 
> > > and after
> > > 
> > > { _24, _24 } * { _53, b$real_11 }
> > > 
> > > Before we were able to easily tell that the order for the multiply would 
> > > be
> > > IMAG, REAL.
> > > In the after (GCC 15) case that information is there, but requires us to 
> > > follow
> > > the externals.
> > > 
> > > Richi what do you think about extending externals handling in 
> > > linear_loads_p to
> > > follow all external ops and if they load from the same memref to figure 
> > > out the
> > > "virtual lane permute"?
> > 
> > Externs do not have a permute as we build them from scalars.  So any permute
> > can be trivially imposed on them - rather than TOP they should be BOTTOM. 
> > Of course there’s also no advantage of imposing a permute on them.
> > 
> 
> But the scalars can access memory that we can tell what they are. 
> 
> My point with the above was that it doesn't make sense to me that we know
> that {a[0],a[1]} reads a linearly but that with 
> 
> a1 = a[0]
> a2 = a[1]
> 
> {a1,a2} we say "sorry we know nothing about you". 
> 
> Yes they're externals but they have a defined order of use in the SLP tree.
> This isn't about imposing a permute. I said virtual permute since
> linear_load_p uses the lane permutes on loads to determine the memory access
> order.
> 
> We DO already impose any order on them, but the other operand is oddodd, so
> the overall order ends up being oddodd because any known permute overrides
> unknown ones.

So what's the desired outcome?  I guess PERM_UNKNOWN?  I guess it's
the "other operand" of an add?  What's the (bad) effect of classifying
it as ODDODD (optimistically)?

> So the question is, can we not follow externals in a constructor to figure
> out if how they are used they all read from the same base and in which order?

I don't see how it makes sense to do this.  For the above example, what's
the testcase exhibiting this (and on which arch)?

[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

Reply via email to