Hi!

On Mon, Jul 13, 2020 at 07:25:37AM +0200, Hans-Peter Nilsson wrote:
> > > > > TL;DR: fixing a misdetection of what is a "simple move".
> > > > 
> > > > That is not a very correct characterisation of what this does :-)
> > > 
> > > That's apparently where we completely disagree. :-)
> > 
> > Well, I wrote that code, I know what is considered "just a move" there.
> 
> You lost some context: I'm comparing before/after the
> cc0-conversion for CRIS, where this is a misdetection (a false
> negative) of a move and causes a performance-regression.

The cc0 conversion caused a performance regression.  You can improve
some code in combine to make that not happen.

> > > I certainly don't contest that the move can be eliminated, and
> > > that most cost-effective 2-2 eliminations are helpful.  (See my
> > > other post about combine being eager with allowing same-cost
> > > combinations.)
> > 
> > I did not see that post, do you have a pointer?
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549416.html

I'll reply to that separately.

> > single_set also allows other insns, for example, multiple sets!
> 
> ...where the other sets are unused.  Not sure how that kind of
> thing would get combined here, but if its combined cost is
> beneficial, it would be a win, I guess.

The unused outputs are thrown away by combine, except when that doesn't
match, then it is tried again but with the original clobbers.

> > > Do you have some pointers to PR:s or something else backing that
> > > statement, or is it your work-in-progress hinted below?
> > 
> > I do not know what "work in progress" you mean?
> 
> I'm referring to your "I'll rerun some testing to show this.
> It'll take a while."  Are the regressions you refer to above
> tracked in bugzilla or on some mailing list?

The whole 2-2 combine thing took half a year at least to develop.  I
have posted to gcc-patches@ about it a few times.  Not having the
is_just_move stuff causes highly visible regressions on some targets, I
do not remember which, but it hurts all targets.  2-2 combine with the
is_just_move (and make_extra_copies) stuff was a win everywhere.

The tests just take long to run (it used to be 2h per run, but it is
closer to 3h per run with the current compiler: building cross-compilers
used to be less than 4m per target, this is much worse since a few
years).

Analysing the results is easy for most kinds of instruction combination:
just looking at the binary size of the testcase (I usually build Linux)
gives a good idea how effective combine was.  But for 2-2 combinations
that doesn't show much at all, so I dig through the actual resulting
code (for many targets, not all 30, just those I think are interesting).

> > For 2-2, size does *not* usually change, which brings us immediately
> > into "a lot more work" territory.  Oh, and all x86 compilers ICE.

The x86-64 kernel *does* build, just some boot wrapper code fails, but
the kernel itself does build.  i386 does ICE however, something with
memcpy or some such.

> If combine only did lower-cost combinations (perhaps with
> Richard Sandifords lower-size-when-tied suggestion), I guess
> this wouldn't happen. 0:-)

And we would regress (a LOT).

> > It shows we can change to use single_set here.
> 
> Did you mean "will show whether" or is it already complete?

It did complete, yes (and didn't change a single resulting intruction).
So that was easy :-)

> > I'll review the original patch again, to point out where it still needs
> > changing.
> 
> ...but if you're in progress with a single_set variant, I'm all
> for it.

Yup, it's pretty simple actually :-)

Thanks,


Segher

Reply via email to