TL;DR: fixing a misdetection of what is a "simple move". Looking into performace degradation after de-cc0 for CRIS, I noticed combine behaving badly; it changed a move and a right-shift into two right-shifts, where the "combined" move was not eliminated in later passes, and where the deficiency caused an extra insn in a hot loop: crcu16 (and crcu32) in coremark.
Before de-cc0, the insns input to combine looked like: 33: r58:SI=r56:SI 0>>r48:SI REG_DEAD r56:SI 35: r37:HI=r58:SI#0 and after: 33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_DEAD r56:SI REG_UNUSED dccr:CC 35: {r37:HI=r58:SI#0;clobber dccr:CC;} REG_UNUSED dccr:CC That is, there's always a parallel with a clobber of the condition-codes register. Being a parallel, it's not an is_just_move, but e.g. a single_set. For the de-cc0:ed "combination", it ended up as 33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_UNUSED dccr:CC 35: {r37:HI#0=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_DEAD r56:SI REG_UNUSED dccr:CC That is, a move and a shift turned into two shifts; the extra shift is not eliminated by later passes, while the move was (with cc0, and "will be again") leading to redundant instructions. At first I thought this was due to parallel-ignorant old code but the "guilty" change is actually pretty recent. Regarding a parallel with a clobber not being "just" a move, there's only the two adjacent callers seen in the patch (obviously with the rename), and their use exactly matches to check that the argument is a single_set which is a move. It's always applied to an rtx_insn, so I changed the type and name to avoid the "just" issue. I had to adjust the type when calling single_set. I checked the original commit, c4c5ad1d6d1e1e a.k.a r263067 and it seems parallels-as-sets were just overlooked and that this patch appears to agree with the intent and the comment at the use of i2_was_move and i3_was_move, which has a clause saying "Also do this if we started with two insns neither of which was a simple move". With this correction in place, the performance degradation related to de-cc0 of the CRIS port as measured by coremark is gone and turned into a small win. N.B.: there certainly is a code difference in other hot functions, and the swing between different functions is larger than this difference; to be dealt with separately. Tested cris-elf, x86_64-linux, powerpc64le-linux, 2/3 through aarch64-linux (unexpectedly slow). Ok to commit? 2020-07-06 Hans-Peter Nilsson <h...@axis.com> PR target/93372 * gcc/combine.c (is_move): Rename from is_just_move. Use single_set instead of of peeking directly into the PATTERN. --- gcc/combine.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 7da144e..ed90b16 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2624,13 +2624,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n) return true; } -/* Return whether X is just a single set, with the source - a general_operand. */ +/* Return whether INSN is just a single set, with the source + a general_operand. INSN must be an insn, not stripped to its PATTERN. */ static bool -is_just_move (rtx x) +is_move (const rtx_insn *insn) { - if (INSN_P (x)) - x = PATTERN (x); + rtx x = single_set (insn); + + if (x == NULL_RTX) + return false; return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode)); } @@ -3103,8 +3105,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, } /* Record whether i2 and i3 are trivial moves. */ - i2_was_move = is_just_move (i2); - i3_was_move = is_just_move (i3); + i2_was_move = is_move (i2); + i3_was_move = is_move (i3); /* Record whether I2DEST is used in I2SRC and similarly for the other cases. Knowing this will help in register status updating below. */ -- 2.11.0