TL;DR: fixing a misdetection of what is a "simple move".

Looking into performace degradation after de-cc0 for CRIS, I
noticed combine behaving badly; it changed a move and a
right-shift into two right-shifts, where the "combined" move was
not eliminated in later passes, and where the deficiency caused
an extra insn in a hot loop: crcu16 (and crcu32) in coremark.

Before de-cc0, the insns input to combine looked like:
   33: r58:SI=r56:SI 0>>r48:SI
      REG_DEAD r56:SI
   35: r37:HI=r58:SI#0
and after:
   33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
      REG_DEAD r56:SI
      REG_UNUSED dccr:CC
   35: {r37:HI=r58:SI#0;clobber dccr:CC;}
      REG_UNUSED dccr:CC
That is, there's always a parallel with a clobber of the
condition-codes register.  Being a parallel, it's not an
is_just_move, but e.g. a single_set.

For the de-cc0:ed "combination", it ended up as
   33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;}
      REG_UNUSED dccr:CC
   35: {r37:HI#0=r56:SI 0>>r48:SI;clobber dccr:CC;}
      REG_DEAD r56:SI
      REG_UNUSED dccr:CC
That is, a move and a shift turned into two shifts; the extra
shift is not eliminated by later passes, while the move was
(with cc0, and "will be again") leading to redundant
instructions.

At first I thought this was due to parallel-ignorant old code
but the "guilty" change is actually pretty recent.  Regarding a
parallel with a clobber not being "just" a move, there's only
the two adjacent callers seen in the patch (obviously with the
rename), and their use exactly matches to check that the
argument is a single_set which is a move.  It's always applied
to an rtx_insn, so I changed the type and name to avoid the
"just" issue.  I had to adjust the type when calling single_set.

I checked the original commit, c4c5ad1d6d1e1e a.k.a r263067 and
it seems parallels-as-sets were just overlooked and that this
patch appears to agree with the intent and the comment at the
use of i2_was_move and i3_was_move, which has a clause saying
"Also do this if we started with two insns neither of which was
a simple move".

With this correction in place, the performance degradation
related to de-cc0 of the CRIS port as measured by coremark is
gone and turned into a small win.  N.B.: there certainly is a
code difference in other hot functions, and the swing between
different functions is larger than this difference; to be dealt
with separately.

Tested cris-elf, x86_64-linux, powerpc64le-linux, 2/3 through
aarch64-linux (unexpectedly slow).

Ok to commit?

2020-07-06  Hans-Peter Nilsson  <h...@axis.com>

        PR target/93372
        * gcc/combine.c (is_move): Rename from is_just_move.  Use
        single_set instead of of peeking directly into the PATTERN.
---
 gcc/combine.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7da144e..ed90b16 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2624,13 +2624,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
   return true;
 }
 
-/* Return whether X is just a single set, with the source
-   a general_operand.  */
+/* Return whether INSN is just a single set, with the source
+   a general_operand.  INSN must be an insn, not stripped to its PATTERN.  */
 static bool
-is_just_move (rtx x)
+is_move (const rtx_insn *insn)
 {
-  if (INSN_P (x))
-    x = PATTERN (x);
+  rtx x = single_set (insn);
+
+  if (x == NULL_RTX)
+    return false;
 
   return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode));
 }
@@ -3103,8 +3105,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
     }
 
   /* Record whether i2 and i3 are trivial moves.  */
-  i2_was_move = is_just_move (i2);
-  i3_was_move = is_just_move (i3);
+  i2_was_move = is_move (i2);
+  i3_was_move = is_move (i3);
 
   /* Record whether I2DEST is used in I2SRC and similarly for the other
      cases.  Knowing this will help in register status updating below.  */
-- 
2.11.0

Reply via email to