Re: [RFC] split pseudos during loop unrolling in RTL unroller

Jeff Law via Gcc-patches Thu, 23 Apr 2020 13:56:11 -0700

On Thu, 2020-04-23 at 15:16 -0500, Segher Boessenkool wrote:
> On Thu, Apr 23, 2020 at 08:40:50AM -0600, Jeff Law wrote:
> > On Thu, 2020-04-23 at 15:07 +0200, Richard Biener wrote:
> > > On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool
> > > <seg...@kernel.crashing.org> wrote:
> > > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote:
> > > > > > > But being stuck with something means no progress...  I know
> > > > > > > very well it's 100 times harder to get rid of something than to
> > > > > > > add something new ontop.
> > > > > > 
> > > > > > Well, what progress do you expect to make?  After expand that is :-)
> > > > > 
> > > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE,
> > > > > no CSE, ...
> > > > 
> > > > RTL CSE for example is very much required to get any good code.  It
> > > > needs to CSE stuff that wasn't there before expand.
> > > 
> > > Sure, but then we should fix that!
> > Exactly.  It's purpose largely becomes dealing with the redundancies exposed
> > by
> > expansion.  ie, address arithmetic and the like.   A lot of its path
> > following
> > code should be throttled back.
> 
> Hrm, I never thought about it like this.  CSE was always there, I never
> stopped to question if we needed it :-)
:-) It's a dog-slow pass that isn't nearly as important as it once was.
> 
> Well, that's cse1 then.  What about cse2?
cse2's original purpose was to clean up after the loop optimizers and it should
be doing even less work than cse1.


The steps in my mind are to see what's left in the first jump pass, then the cse
path following code, then the core of cse itself.  lower-subreg is a wart that
could likely go away if we stop lying about the target's capabilities.

> 
> > > But valid RTL is instructions that are recognized.  Which means
> > > when the target doesn't support an SImode add we may not create
> > > one.  That's instruction selection ;)
> > That's always a point of tension.  But I think that in general continuing to
> > have
> > targets claim to support things they do not (such as double-wordsize
> > arithmetic,
> > logicals, moves, etc) is a mistake.  It made sense at one time, but I think
> > we've
> > got better mechansisms in place to deal with this stuff now.
> 
> Different targets have *very* different insns for add, mul, div, shifts;
> everything really.  Describing this at expand time with two-machine-word
> operations works pretty bloody well, for most or all targets -- this is
> just part of the power of define_expand (but an important part).  And
> define_expand is very very useful, it's the swiss army escape hatch, it
> lets you do everything optabs have a too small mind for.
Absolutely true, but most of the double-word-mode-crap was in there because we
couldn't really describe things like the carry bit.  That's long since been 
fixed
and I bet if we handled just that the vast majority pretending the target has
double-word support becomes unnecessary (and the need for early lower-subreg is
then reduced significantly as well).

And if we look at the number of tricks that are used to do things like optimize
double-word moves in the target files, it's just insane.  If we stop lying and
take advantage of improvements over the last 20 years we end up killing lots of
crappy target code.


> There are two kinds of costing.  The first only says which of A or B is
> better; that can perhaps be done on GIMPLE already, using
> target-specific costs.  The other gives a number to everything, which is
> much harder to get anywhere close to usably correct (what does the
> number even *mean*?  For performance, latency of the whole sequence is
> the most important number, but that is not easy to work with, or what we
> use for say insn_cost).
True.  I'm referring mostly to traditional costing.  ie, given form A & B, which
is preferred for the target.  Full sequence latency is distinct issue, though we
let it bleed into the former (for example by rewriting sequences with similar
costs into a sequence with fewer dependencies).

> 
> > But I think there is a place for adding target dependencies -- and that's at
> > the
> > end of the current gimple pipeline.
> 
> There are a *few* things in GIMPLE that use target costs (ivopts...)
> But yeah, most things should not.
Precisely.  Avoiding target dependencies is the aspiration goal, but we have to
also be sensible and realize that some things really do require a degree of
target knowledge.
> 

Jeff

Re: [RFC] split pseudos during loop unrolling in RTL unroller

Reply via email to