Re: RFA: Fix combine.c combining a move and a non-move into two non-moves, PR93372

Segher Boessenkool Mon, 06 Jul 2020 16:43:06 -0700

Hi!

On Mon, Jul 06, 2020 at 03:11:17AM +0200, Hans-Peter Nilsson wrote:
> TL;DR: fixing a misdetection of what is a "simple move".


That is not a very correct characterisation of what this does :-)

> Looking into performace degradation after de-cc0 for CRIS, I
> noticed combine behaving badly; it changed a move and a
> right-shift into two right-shifts, where the "combined" move was
> not eliminated in later passes, and where the deficiency caused
> an extra insn in a hot loop: crcu16 (and crcu32) in coremark.
> 
> Before de-cc0, the insns input to combine looked like:
>    33: r58:SI=r56:SI 0>>r48:SI
>       REG_DEAD r56:SI
>    35: r37:HI=r58:SI#0
> and after:
>    33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
>       REG_DEAD r56:SI
>       REG_UNUSED dccr:CC
>    35: {r37:HI=r58:SI#0;clobber dccr:CC;}
>       REG_UNUSED dccr:CC

So a shift like this is at most as expensive as a move, on your target
(or, in your backend, anyway ;-) )

> That is, there's always a parallel with a clobber of the
> condition-codes register.  Being a parallel, it's not an
> is_just_move, but e.g. a single_set.
> 
> For the de-cc0:ed "combination", it ended up as
>    33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
>       REG_UNUSED dccr:CC
>    35: {r37:HI#0=r56:SI 0>>r48:SI;clobber dccr:CC;}
>       REG_DEAD r56:SI
>       REG_UNUSED dccr:CC
> That is, a move and a shift turned into two shifts; the extra
> shift is not eliminated by later passes, while the move was
> (with cc0, and "will be again") leading to redundant
> instructions.

Which was the whole point of the is_just_move() thing, yes.  Combine
doesn't know most moves will be eliminated by RA (but many are useful to
do have before RA, because it gives RA much more freedom).  If a move is
the same cost as a simple insn, doing two (say shift, like here) insns
in parallel is cheaper on most machines than having a shift and a move
sequentially.  (2-2 combinations are helpful on single-scalar and even
in-order machines as well, btw).

> At first I thought this was due to parallel-ignorant old code
> but the "guilty" change is actually pretty recent.  Regarding a
> parallel with a clobber not being "just" a move, there's only
> the two adjacent callers seen in the patch (obviously with the
> rename), and their use exactly matches to check that the
> argument is a single_set which is a move.  It's always applied
> to an rtx_insn, so I changed the type and name to avoid the
> "just" issue.  I had to adjust the type when calling single_set.

But it is *not* supposed to be the same as single_set.

> I checked the original commit, c4c5ad1d6d1e1e a.k.a r263067 and
> it seems parallels-as-sets were just overlooked and that this

They were not.  It causes regressions.  That is why it has a different
name, not something with "single set".  "just_move" isn't a very good
name, I couldn't come up with something better, it is a pretty
complicated concept :-/

"i2_should_not_be_2_2_combined"?

I'll rerun some testing to show this.  It'll take a while.

> patch appears to agree with the intent and the comment at the
> use of i2_was_move and i3_was_move, which has a clause saying
> "Also do this if we started with two insns neither of which was
> a simple move".

That says that we combine to 2 insns also if we started with only 2,
but neither of those was just a move.

> With this correction in place, the performance degradation
> related to de-cc0 of the CRIS port as measured by coremark is
> gone and turned into a small win.  N.B.: there certainly is a
> code difference in other hot functions, and the swing between
> different functions is larger than this difference; to be dealt
> with separately.
> 
> Tested cris-elf, x86_64-linux, powerpc64le-linux, 2/3 through
> aarch64-linux (unexpectedly slow).
> 
> Ok to commit?

No, sorry.

One thing that could work is allowing (unused) clobbers of fixed
registers (like you have here), or maybe some hook is needed to say this
register is like a flags register, or similar.  That should work for you,
and not regress other targets, maybe even help a little?  We'll see.

Thanks,


Segher

Re: RFA: Fix combine.c combining a move and a non-move into two non-moves, PR93372

Reply via email to