[dropping port maintainers for combine-mostly subthread] On Sep 3, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote: >> On Sep 2, 2020, Segher Boessenkool <seg...@kernel.crashing.org> wrote: >> >> we might succeed, but only if we had a pattern >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as >> >> the second element of the parallel, because that's where combine adds it >> >> to the new i3 pattern, after splitting it out of i2. >> >> > That sounds like the backend pattern has it wrong then? There is a >> > canonical order for this? >> >> Much as I can tell, there isn't, it's just an arbitrary choice of >> backends, some do it one way or the other, and that causes combine to be >> able to perform some combinations but not others. > For instructions that inherently set a condition code register, the > @code{compare} operator is always written as the first RTL expression of > the @code{parallel} instruction pattern. Interesting. I'm pretty sure I read email recently that suggested it was really up to the port, but I've caught up with GCC emails from years ago, so that might have been it. Or I just misremember. Whatever. The x86 pattern that fails to match in combine has the flags setter first, but combine places it second, after splitting it out of i2 and then appending it back to i3. Alas, it would be just as legitimate for combine to go the opposite way, substituting the flags set into another insn, and then tacking the other set onto the substituted-into insn. Since there is a canonical order, maybe combine should attempt to follow that order. Meanwhile, there's the patchlet below, that gets us (*) a grand total of *one* additional insn combine (in libbacktrace/dwarf.c) per bootstrap stage, and then, in target libs, 3 permuted combines in libhsail-rt/rt/workitems.c and 4 in s-expmod.adb. Not bad, but I'd agree it could be regarded as a bit of a waste. This is not finished, I'd like to abstract out a little more of the permutation representation, that had to be explicitly used in combine to extract the actual clobber count before permuting the newly-created rtvec, but I was also surprised at how easy it turned out to be. I was also happy I missed a few orders of magnitude in my earlier estimates about how far we could go into (representing) permutations without taking up all of num_clobbers. It's so loose that I went for bitfields rather than arithmetic coding. But the reason I wanted to abstract it out was to make it easy to switch to a different encoding. Anyway... Does this still seem worth pursuing? permute parallel items in recog From: Alexandre Oliva <ol...@gnu.org> --- gcc/combine.c | 35 ++++++++++++++++-------- gcc/genrecog.c | 78 ++++++++++++++++++++++++++++++++++++------------------ gcc/rtl.h | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 156 insertions(+), 38 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index c88382e..4383ce5 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11510,11 +11510,14 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) them. Then check to make sure that all of them are dead. */ if (num_clobbers_to_add) { + int nclob = (PERMUTE_MULT (0) + ? (num_clobbers_to_add % PERMUTE_MULT (0)) + : num_clobbers_to_add); rtx newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (GET_CODE (pat) == PARALLEL ? (XVECLEN (pat, 0) - + num_clobbers_to_add) - : num_clobbers_to_add + 1)); + + nclob) + : nclob + 1)); if (GET_CODE (pat) == PARALLEL) for (i = 0; i < XVECLEN (pat, 0); i++) @@ -11522,19 +11525,27 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) else XVECEXP (newpat, 0, 0) = pat; - add_clobbers (newpat, insn_code_number); + if (num_clobbers_to_add != nclob) + recog_unpermute_parallel (newpat, &num_clobbers_to_add); + gcc_checking_assert (num_clobbers_to_add == nclob); - for (i = XVECLEN (newpat, 0) - num_clobbers_to_add; - i < XVECLEN (newpat, 0); i++) + if (num_clobbers_to_add) { - if (REG_P (XEXP (XVECEXP (newpat, 0, i), 0)) - && ! reg_dead_at_p (XEXP (XVECEXP (newpat, 0, i), 0), insn)) - return -1; - if (GET_CODE (XEXP (XVECEXP (newpat, 0, i), 0)) != SCRATCH) + add_clobbers (newpat, insn_code_number); + + for (i = XVECLEN (newpat, 0) - num_clobbers_to_add; + i < XVECLEN (newpat, 0); i++) { - gcc_assert (REG_P (XEXP (XVECEXP (newpat, 0, i), 0))); - notes = alloc_reg_note (REG_UNUSED, - XEXP (XVECEXP (newpat, 0, i), 0), notes); + if (REG_P (XEXP (XVECEXP (newpat, 0, i), 0)) + && ! reg_dead_at_p (XEXP (XVECEXP (newpat, 0, i), 0), insn)) + return -1; + if (GET_CODE (XEXP (XVECEXP (newpat, 0, i), 0)) != SCRATCH) + { + gcc_assert (REG_P (XEXP (XVECEXP (newpat, 0, i), 0))); + notes = alloc_reg_note (REG_UNUSED, + XEXP (XVECEXP (newpat, 0, i), 0), + notes); + } } } pat = newpat; diff --git a/gcc/genrecog.c b/gcc/genrecog.c index 5b5b72f..2625660 100644 --- a/gcc/genrecog.c +++ b/gcc/genrecog.c @@ -5295,20 +5295,15 @@ get_peephole2_pattern (md_rtx_info *info) return pattern; } -/* Return true if *PATTERN_PTR is a PARALLEL in which at least one trailing - rtx can be added automatically by add_clobbers. If so, update - *ACCEPTANCE_PTR so that its num_clobbers field contains the number - of such trailing rtxes and update *PATTERN_PTR so that it contains - the pattern without those rtxes. */ +/* Return the number of trailing rtx in a PARALLEL that + can be added automatically by add_clobbers. */ -static bool -remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr) +static int +remove_clobbers (rtx pattern) { int i; - rtx new_pattern; /* Find the last non-clobber in the parallel. */ - rtx pattern = *pattern_ptr; for (i = XVECLEN (pattern, 0); i > 0; i--) { rtx x = XVECEXP (pattern, 0, i - 1); @@ -5318,24 +5313,53 @@ remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr) break; } - if (i == XVECLEN (pattern, 0)) - return false; + return XVECLEN (pattern, 0) - i; +} - /* Build a similar insn without the clobbers. */ - if (i == 1) - new_pattern = XVECEXP (pattern, 0, 0); - else +/* Match a PATTERN with a top-level PARALLEL, with permutations + starting at POS and up to MAX_PERMUTE. CLOB is the trailing + clobbers count (returned by remove_clobbers). */ + +static void +match_parallel (state *s, md_rtx_info *info, rtx pattern, + acceptance_type acceptance, int pos, int clob) +{ + int len = XVECLEN (pattern, 0); + int len_clob = len - clob; + + if (pos == len) { - new_pattern = rtx_alloc (PARALLEL); - XVEC (new_pattern, 0) = rtvec_alloc (i); - for (int j = 0; j < i; ++j) - XVECEXP (new_pattern, 0, j) = XVECEXP (pattern, 0, j); + match_pattern (s, info, pattern, acceptance); + return; } + + match_parallel (s, info, pattern, acceptance, pos + 1, clob); - /* Recognize it. */ - acceptance_ptr->u.full.u.num_clobbers = XVECLEN (pattern, 0) - i; - *pattern_ptr = new_pattern; - return true; + if (pos < len_clob) + { + for (int i = pos + 1; i < len_clob && i < MAX_PERMUTE; i++) + { + acceptance.u.full.u.num_clobbers += PERMUTE_MULT (pos); + std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i)); + match_parallel (s, info, pattern, acceptance, pos + 1, clob); + std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i)); + } + } + else if (pos == len_clob) + { + gcc_checking_assert (clob == len - pos); + gcc_checking_assert (clob < PERMUTE_MULT (0)); + + acceptance.u.full.u.num_clobbers += clob; + if (pos == 1) + match_pattern (s, info, XVECEXP (pattern, 0, 0), acceptance); + else + { + XVECLEN (pattern, 0) = pos; + match_pattern (s, info, pattern, acceptance); + XVECLEN (pattern, 0) = len; + } + } } int @@ -5371,13 +5395,15 @@ main (int argc, const char **argv) acceptance.u.full.u.num_clobbers = 0; pattern = add_implicit_parallel (XVEC (def, 1)); validate_pattern (pattern, &info, NULL_RTX, 0); - match_pattern (&insn_root, &info, pattern, acceptance); /* If the pattern is a PARALLEL with trailing CLOBBERs, allow recog_for_combine to match without the clobbers. */ - if (GET_CODE (pattern) == PARALLEL - && remove_clobbers (&acceptance, &pattern)) + if (GET_CODE (pattern) != PARALLEL) match_pattern (&insn_root, &info, pattern, acceptance); + else + match_parallel (&insn_root, &info, pattern, acceptance, + 0, remove_clobbers (pattern)); + break; } diff --git a/gcc/rtl.h b/gcc/rtl.h index 0872cc4..083ed0d 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -4456,4 +4456,85 @@ extern void gt_ggc_mx (rtx &); extern void gt_pch_nx (rtx &); extern void gt_pch_nx (rtx &, gt_pointer_operator, void *); + +/* Define to the number of initial elements of a parallel in an insn's + pattern that we recognize in any permutation. The permutation is + encoded in upper bits of pnum_clobbers set by recog. If + pnum_clobbers is NULL, no permutations are attempted. */ +#define MAX_PERMUTE 3 + +/* These are multipliers used to encode the above permutations. We + could use a more compact representation, but using bitfields is + nicer. The permutations of the pattern described in the md file + for it to match an insn are encoded as follows: + + For each POS from 0 to PERMUTE_MAX-1: + + Select Choice as any of the items from POS to PERMUTE_MAX-1; + Swap the item at Choice with that at Pos; + Encode it as (Choice - Pos) multiplied by PERMUTE_MULT (Pos). + + For the last iteration, namely Pos = PERMUTE_MAX-1, there is only + one choice, thus nothing to encode. + + For the 1-before-last iteration, there are 2 choices, so we need + one bit: 1<<30 (we don't want to use the sign bit). + + For the 2-before-last iteration, there are 3 choices, so we use the + next two bits, thus 1<<28 is the multiplier. + + Generalizing, for Pos = PERMUTE_MAX - 2 - I, we use the multiplier + at index I. */ +static const int num_clobbers_permute_multipliers[] = { + 1 << 30, + 1 << 28, 1 << 26, + 1 << 23, 1 << 20, 1 << 17, 1 << 14, + 1 << 10, 1 << 6, 1 << 2 +}; + +/* Make sure to not define MAX_PERMUTE too high for the array. */ +#define MAX_MAX_PERMUTE (2 + ARRAY_SIZE (num_clobbers_permute_multipliers)) +static_assert (MAX_PERMUTE <= MAX_MAX_PERMUTE, + "MAX_PERMUTE is too big"); + +/* Get the multiplier to use for the permutation choice at POS. */ +#define PERMUTE_MULT(POS) \ + (num_clobbers_permute_multipliers[MAX_PERMUTE - 2 - (POS)]) + +/* Return the mask for the encoding of hte permutation choice at POS. */ +static inline int +PERMUTE_MASK(int Pos) +{ + int i = MAX_PERMUTE - 2 - Pos; + + if (i < 0) + return 0; + + if (i == 0) + return num_clobbers_permute_multipliers[0]; + + return (num_clobbers_permute_multipliers[i-1] + - num_clobbers_permute_multipliers[i]); +} + +/* Modify PAT, recognized with a permutatoin encoded in *NUMCLOB, to + its canonical form, i.e., one that would be recognized without + permutations. PAT is expected to be a PARALLEL if there is any + permutation to make; recog will not encode permutations if there + isn't a PARALLEL pattern. */ +static inline void +recog_unpermute_parallel (rtx pat, int *numclob) +{ + for (int pos = MAX_PERMUTE - 2; pos >= 0; pos--) + { + int masked = *numclob & PERMUTE_MASK (pos); + if (!masked) + continue; + *numclob -= masked; + int choice = masked / PERMUTE_MULT (pos); + std::swap (XVECEXP (pat, 0, pos), + XVECEXP (pat, 0, pos + choice)); + } +} + #endif /* ! GCC_RTL_H */ -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer