On Mon, 25 Jan 2021, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Fri, 22 Jan 2021, Segher Boessenkool wrote:
> >
> >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
> >> > On Thu, 21 Jan 2021, Segher Boessenkool wrote:
> >> > > What is holding up this patch still?  Ke Wen has pinged it every month
> >> > > since May, and there has still not been a review.
> >> 
> >> Richard Sandiford wrote:
> >> > FAOD (since I'm on cc:), I don't feel qualified to review this.
> >> > Tree-level loop stuff isn't really my area.
> >> 
> >> And Richard Biener wrote:
> >> > I don't like it, it feels wrong but I don't have a good suggestion
> >> > that had positive feedback.  Since a reviewer / approver is indirectly
> >> > responsible for at least the design I do not want to ack this patch.
> >> > Bin made forward progress on the other parts of the series but clearly
> >> > there's somebody missing with the appropriate privileges who feels
> >> > positive about the patch and its general direction.
> >> > 
> >> > Sorry to be of no help here.
> >> 
> >> How unfortunate :-(
> >> 
> >> So, first off, this will then have to work for next stage 1 to make any
> >> progress.  Rats.
> >> 
> >> But what could have been done differently that would have helped?  Of
> >> course Ke Wen could have written a better patch (aka one that is more
> >> acceptable); either of you could have made your current replies earlier,
> >> so that it is clear help needs to be sought elsewhere; and I could have
> >> pushed people earlier, too.  No one really did anything wrong, I'm not
> >> seeking who to blame, I'm just trying to find out how to prevent
> >> deadlocks like this in the future (where one party waits for replies
> >> that will never come).
> >> 
> >> Is it just that we have a big gaping hole in reviewers with experience
> >> in such loop optimisations?
> >
> > May be.  But what I think is the biggest problem is that we do not
> > have a good way to achieve what the patch tries (if you review the
> > communications you'll see many ideas tossed around) first and foremost
> > because IV selection is happening early on GIMPLE and unrolling
> > happens late on RTL.  Both need a quite accurate estimate of costs
> > but unrolling has an ever harder time than IV selection where we've
> > got along with throwing dummy RTL at costing functions.
> >
> > IMHO the patch is the wrong "start" to try fixing the issue and my
> > fear is that wiring this kind of "features" into the current
> > (fundamentally broken) state will make it much harder to rework
> > that state without introducing regressions on said features (I'm
> > there with trying to turn the vectorizer upside down - for three
> > years now, struggling to not regress any of the "features" we've
> > accumulated for various targets where most of them feel a
> > "bolted-on" rather than well-designed ;/).
> 
> Thinking of any features in particular here?

Mostly all of the special-cases in load/store vectorization, but then...

> Most of the ones I can think of seem to be doing things in the way
> that the current infrastructure expects.  But of course, the current
> infrastructure isn't perfect, so the end result isn't either.

... indeed most of the issues are because of design decisions very
early in the vectorizers lifetime.

> Still, I agree with the above apart from maybe that last bit. ;-)
> 
> > I think IV selection and unrolling (and scheduling FWIW) need to move
> > closer together.  I do not have a good idea how that can work out
> > though but I very much believe that this "most wanted" GIMPLE unroller
> > will not be a good way of progressing here.
> 
> What do you feel about unrolling in the vectoriser (by doubling the VF, etc.)
> in cases where something about the target indicates that that would be
> useful?  I think that's a good place to do it (for the cases that it
> handles) because it's hard to unroll later and then interleave.

I think that's fine (similar to unrolling during IV selection).
Unrolling if there's a reason (other than unrolling) is good.  Of course
some costing has to be applied which might be difficult on GIMPLE
(register pressure, CPU resource allocation aka scheduling).

> > Maybe taking the bullet and moving IV selection back to RTL is the
> > answer.
> 
> I think that would be a bad move.  The trend recently seems to have been
> to lower stuff to individual machine operations earlier in the rtl pass
> pipeline (often immediately during expand) rather than split them later.
> The reasoning behind that is that (1) gimple has already heavily optimised
> the unlowered form and (2) lowering earlier gives the more powerful rtl
> optimisers a chance to do something with the individual machine operations.
> It's going to be hard for an RTL ivopts pass to piece everything back
> together.

Hmm, OK.  But of course individual machine instructions is that
determine the cost ...

Anyway, I think the GIMPLE -> RTL transition currently is a too
big step and we should eventually try to lower GIMPLE and IV
selection should happen on a lower form where we for example
can do the 1st level scheduling on or at least have a better idea
on resource allocation and latencies in dependence cycles since
that's what really matters for unrolling.

> > For a "short term" solution I still think that trying to perform
> > unrolling and IV selection (for the D-form case you're targeting)
> > at the same time is a better design, even if it means complicating
> > the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
> > any good idea about scheduling).  There are currently 20+ GIMPLE
> > optimization passes and 10+ RTL optimization passes between
> > IV selection and unrolling, the idea that you can have transform
> > decision and transform apply this far apart looks scary.
> 
> FWIW, another option might be to go back to something like:
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532676.html
> 
> I agree that it was worth putting that series on hold and trying a more
> target-independent approach, but I think in the end it didn't work out,
> for the reasons Richard says.  At least the target-specific pass would
> be making a strict improvement to the IL that it sees, rather than
> having to predict what future passes might do or might want.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to