Re: [RFC] split pseudos during loop unrolling in RTL unroller

Richard Biener via Gcc-patches Thu, 23 Apr 2020 06:08:27 -0700

On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
>
> On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > But being stuck with something means no progress...  I know
> > > > very well it's 100 times harder to get rid of something than to
> > > > add something new ontop.
> > >
> > > Well, what progress do you expect to make?  After expand that is :-)
> >
> > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > no CSE, ...
>
> RTL CSE for example is very much required to get any good code.  It
> needs to CSE stuff that wasn't there before expand.


Sure, but then we should fix that!

> The pass currently does much more (as well as not enough), of course.
>
> > The important part before RA is robust and intelligent
> > instruction selection - part of which is already done at RTL expansion
> > time.
>
> LOL.
>
> The expand pass doesn't often make good choices, and it *shouldn't*, it
> should not make many choices at all; it should just generate valid RTL,
> new pseudos for everything, and let later RTL passes make faster code
> from that.

But valid RTL is instructions that are recognized.  Which means
when the target doesn't support an SImode add we may not create
one.  That's instruction selection ;)

> > > Most of what is done in RTL is done very well.
> >
> > Umm, well...  I beg to differ with regard to DF and passes like
> > postreload-gcse.
>
> What is wrong with DF?

It's slow and memory hungry?

> Is there something particular in postreload-gcse that is bad?  To me it
> always is just one of those passes that doesn't do anything :-)  That
> can and should be cleaned up, sure :-)

postreload-gcse is ad-hoc, it uses full blown gcse tools that easily
blow up (compute_transp) when it doesn't really require it
(Ive fixed things up a bit in dc91c65378cd0e6c0).  But I wonder why,
if we want to do PRE of loads, we don't simply schedule another
gcse pass rather than implementing a new one.  IIRC what the pass
does could be done with much more local dataflow.  Both
postreload gcse and cse are major time-hogs on "bad" testcases :/

> > > Replacing specific things in RTL, maybe as big as whole passes, already
> > > is hard to do without regressing (a *lot*).  And if there is no real
> > > reason to do that...
> >
> > The motivation is to make GCC faster, obviously.  Not spending time
> > doing things that should have been done before (RTL PRE vs. GIMPLE PRE, 
> > etc.).
> > Using the same infrastructure (what, no loop dependency analysis on RTL?), 
> > etc.
>
> But everything you want to remove isn't high on profiles anyway?  And
> you proposed adding bigger, slower, stuff to replace it all with.
>
> Slow are RA, and the language frontends (esp. C++ like to take 15% of
> total :-/)
>
> > You could say we should do more on RTL and enhance it instead, like do
> > vectorization where we actually could have a better idea on costs and
> > capabilities.  But I'm a GIMPLE person and don't know enough of RTL to
> > enhance it ...
>
> Oh no, I think we should do more earlier, and GIMPLE is a fine IR for
> there.  But for low-level, close-to-the-machine stuff, RTL is much
> better suited.  And we *do* want to optimise at that level as well, and
> much more than just peepholes.

Well, everything that requires costing (unrolling, vectorization,
IV selection to name a few) _is_ close-to-the-machine.  We're
just saying they are not because GIMPLE is so much easier to
work with here (not sure why exactly...).

Richard.

>
> Segher

Re: [RFC] split pseudos during loop unrolling in RTL unroller

Reply via email to