Re: [RFC] split pseudos during loop unrolling in RTL unroller

Richard Biener via Gcc-patches Thu, 23 Apr 2020 05:26:13 -0700

On Thu, Apr 23, 2020 at 2:07 PM Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
>
> On Thu, Apr 23, 2020 at 12:52:30PM +0200, Richard Biener wrote:
> > On Thu, Apr 23, 2020 at 12:17 PM Segher Boessenkool
> > <seg...@kernel.crashing.org> wrote:
> > >
> > > On Thu, Apr 23, 2020 at 09:32:37AM +0200, Richard Biener wrote:
> > > > On Thu, Apr 23, 2020 at 12:31 AM Jeff Law <l...@redhat.com> wrote:
> > > > > On Wed, 2020-04-22 at 15:50 -0500, Segher Boessenkool wrote:
> > > > > > > > In some ways it feels like it would be easier to resurrect RTL 
> > > > > > > > SSA :-)
> > > > > >
> > > > > > Why was RTL SSA abandoned?
> > > > > >
> > > > > > It might well work to keep everything in SSA form all the way to RA.
> > > > > > Hrm, that doesn't sound bad at all :-)
> > > > > >
> > > > > > (The PHIs need to be made explicit to something that resembles the
> > > > > > machine code we will end up with, very early in the pipeline, but it
> > > > > > could still also be some valid SSA form; and we can of course also
> > > > > > have hard registers in all RTL, so that needs to be dealt with 
> > > > > > sanely
> > > > > > some way as well  Lots of details, I don't see a crucial problem 
> > > > > > though,
> > > > > > probably means I need to look harder ;-) )
> > > > > Lack of time mostly.  There's some complications like subregs, 
> > > > > argument registers
> > > > > and the like.  But you can restrict ssa based analysis & 
> > > > > optimizations to just
> > > > > the set of pseudos that are in SSA form and do something more 
> > > > > conservative on the
> > > > > rest.
> > > >
> > > > I guess time is better spent on trying to extend GIMPLE + SSA up to RA, 
> > > > thus
> > > > make instruction selection on GIMPLE.
> > >
> > > I think this is a bad idea.  By the time you have invented enough new
> > > "lower GIMPLE" ("limple"?) to be able to use it to describe machine
> > > insns like we can with RTL, you will have a more verbose, more memory
> > > hungry, slower, etc. reinvented RTL.
> >
> > I don't think there's much to invent.
> >
> > I think at least one step would be uncontroversical(?), namely moving
> > the RTL expansion "magic"
> > up to a GIMPLE pass.  Where the "magic" would be to turn
> > GIMPLE stmts not directly expandable via an existing optab into
> > GIMPLE that can be trivially expanded.  That includes eventually
> > combining multiple stmts into more powerful instructions and
> > doing the magic we have in, like, expand_binop (widening, etc.).
> > Where there's not a 1:1 mapping of a GIMPLE stmt to an optab
> > GIMPLE gets direct-internal-fn calls.
> > Then RTL expansion would be mostly invoking gen_insn (optab-code).
>
> Most of expand is *other stuff*.  Expand does a *lot* of things that are
> actually changing the code.  And much of that is not done anywhere else
> either yet, so this cannot be fixed by simply deleting the offending code.
>
> > More controversical would be ending up in GIMPLE there.  I think
> > GIMPLE can handle all RTL insns if we massage GIMPLE_ASM
> > a bit.  You'd end up with, say,
> >
> >  asm ("(set (reg:DI $0)
> >                 (and:DI (reg/v:DI $1 [ dst ])
> >                     (reg:DI $2)))" : "r" (_1) : "r" (_2), "r" (_3) : "cc");
> >
> > in place of
> >
> >   _1 = _2 & _3;
> >
> > and the GIMPLE_ASM text could be actual RTL.  We'd extend
> > the stmt with an extra operand to denote recognized patterns,
> > so another option would be to keep the original GIMPLE as well.
>
> Why would you ever want to do that?  That would take much more memory,
> and RTL's memory use until recently always was a pain point.


It's not the RTL IL that uses much memory, it's infrastructure like DF
that easily blows up.  Mind that GCC has only a single function in
RTL at a time but the whole program in GIMPLE.  And GIMPLE
includes "DF" by default.

> > > RTL is a *feature*, and it is one of the things that makes GCC
> > > significantly better than the competition.
> >
> > That said, I actually agree with that.  It's just that I hope we can
> > make some of the knowledge just represented on the RTL side
> > available on the GIMPLE side.  The more complicated parts,
> > like calling conventions, that is.
>
> Yeah, and like I said, some things (unroll...) should move to GIMPLE as
> well, most of it anyway.  And some of the remaining RTL code needs a
> good overhaul (oh hello CSE).
>
> > And yes, I want to get rid of that expand monster to be able to
> > do something like sched1 on "GIMPLE" without expand coming
> > along and re-scheduling everything at-will.
>
> Right, ideally, expand would just translate GIMPLE to RTL one-to-one
> (well, few-to-few, whatever :-) ).  But it does so much other stuff now,
> so all that has to be moved or reimplemented or whatever.
>
> > > More optimisations should move to GIMPLE, for example some loop
> > > optimisations should be done much earlier (most unrolling).  The expand
> > > pass should lose most of the "optimisations" it has built up over the
> > > decades (that now often are detrimental at best).  Some of what expand
> > > now does should probably be done while still in GIMPLE, even.
> > >
> > > But it is very useful to have a separate "low level" representation,
> > > that is actually close to the machine code we will eventually generate.
> > > RTL is one such representation, and we already have it, and it is very
> > > well tuned by now -- throwing it away would require some *huge*
> > > advantage, because the costs of doing that are immense as well.
> >
> > But being stuck with something means no progress...  I know
> > very well it's 100 times harder to get rid of something than to
> > add something new ontop.
>
> Well, what progress do you expect to make?  After expand that is :-)

I'd like the RTL pipeline before RA to shrink significantly, no PRE,
no CSE, ...  The important part before RA is robust and intelligent
instruction selection - part of which is already done at RTL expansion
time.

> Most of what is done in RTL is done very well.

Umm, well...  I beg to differ with regard to DF and passes like
postreload-gcse.

> Replacing specific things in RTL, maybe as big as whole passes, already
> is hard to do without regressing (a *lot*).  And if there is no real
> reason to do that...

The motivation is to make GCC faster, obviously.  Not spending time
doing things that should have been done before (RTL PRE vs. GIMPLE PRE, etc.).
Using the same infrastructure (what, no loop dependency analysis on RTL?), etc.

You could say we should do more on RTL and enhance it instead, like do
vectorization where we actually could have a better idea on costs and
capabilities.  But I'm a GIMPLE person and don't know enough of RTL to
enhance it ...

Richard.

>
> Segher

Re: [RFC] split pseudos during loop unrolling in RTL unroller

Reply via email to