Kenneth Zadeck <zad...@naturalbridge.com> writes: > This patch takes a different approach to fixing PR52543 than does the > patch in > > http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html > > This patch transforms the lower-subreg pass(es) from unconditionally > splitting wide moves, zero extensions, and shifts, so that it now takes > into account the target specific costs and only does the transformations > if it is profitable.
Thanks for doing this. > + This pass only splits moves with modes are wider than word_mode and ...modes that are wider... > + ashifts, lshiftrt and zero_extensions with integer modes that are Nitlet: lshiftrts to be consistent. Or "ASHIFTs, LSHIFTRTs and ZERO_EXTENDs". > + There are two useful preprocessor defines for use by maintainers: > + > + #define LOG_COSTS > + > + if you wish to see the actual cost estimates that are being used > + for each mode wider than word mode and the cost estimates for zero > + extension and the shifts. This can be useful when port maintainers > + are tuning insn rtx costs. > + > + #define FORCE_LOWERING > + > + if you wish to test the pass with all the transformation forced on. > + This can be useful for finding bugs in the transformations. Must admit I'm not keen on these kinds of macro, but it's Ian's call. Idea for the future (i.e. not this patch) is to have a dump file for target initialisation. > +/* This pass can transform 4 different operations: move, ashift, > + lshiftrt, and zero_extend. There is a boolean vector for move > + splitting that is indexed by mode and is true for each mode that is > + to have its copies split. The other three operations are only done > + for one mode so they are only controlled by a single boolean .*/ As mentioned privately, whether this is profitable for shifts depends to some extent on the shift amount. GCC already supports targets where this transformation would be OK for some shift amounts but not others. So for shifts, I think this should be an array of HOST_BITS_PER_WIDE_INT booleans rather than just one. More comments below about how this filters through your other changes. > +/* This pass can transform 4 different operations: move, ashift, > + lshiftrt, and zero_extend. There is a boolean vector for move > + splitting that is indexed by mode and is true for each mode that is > + to have its copies split. The other three operations are only done > + for one mode so they are only controlled by a single boolean .*/ > +static bool move_modes_to_split[MAX_MACHINE_MODE]; > + > +/* Other splitting operations to be done on the this platform. */ > +static bool splitting_ashift; > +static bool splitting_lshiftrt; > +static bool splitting_zext; > +/* The or of the previous three values. */ > +static bool splitting_some_shifts; > + > +/* The integer mode that is twice the width of word_mode. */ > +static enum machine_mode twice_word_mode; > + > /* Bit N in the bitmap in element M of this array is set if there is a > copy from reg M to reg N. */ > static VEC(bitmap,heap) *reg_copy_graph; > > -/* Return whether X is a simple object which we can take a word_mode > - subreg of. */ > +static bool something_to_do; > + > +/* Precomputed cost values used to determine if lowering shift > + operations is profitable. */ > +static int word_mode_move_cost; > +static int move_zero_cost; > + All this new data should be wrapped in a target structure. See expmed.[hc] for an example. > + return wide_cost > word_mode_move_cost + move_zero_cost; Should be >=. The purpose of this transformation isn't AIUI to improve shifts in themselves. It's just to stop shifts from being an artificial barrier to lower-subreg's main "optimisation" -- which in some ways is more of an IR simplification -- i.e., replacing subregs with regs. We want to do it if the costs of the two sequences are equal. More justification at the end. > + else > + { > + int narrow_cost; > + > + trg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); > + src = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); Probably best to use FIRST_PSEUDO_REGISTER + 1 for the second (here and elsewhere) so that we get the cost of a general 3-operand shift rather than a 2-operand one. > + narrow_cost = insn_rtx_cost (gen_rtx_SET (VOIDmode, trg, > + gen_rtx_fmt_ee (code, > word_mode, > + src, > + GEN_INT > (shift_amt - word_mode_size))), Watch the long lines. Everything should be less than 80 chars. > + return wide_cost > narrow_cost + move_zero_cost; ">=" here too. > +/* Initialize pass once per execution. This envolves determining Once per target. And since it can be called more than once, the function needs to zero out the new data rather than rely on load-time zero-initialisation. > + t = compute_move_cost ((enum machine_mode) i); > + > +#ifdef LOG_COSTS > + fprintf (stderr, "mode %s, move cost %d, factor %d\n", mode_name[i], > t, factor); > +#endif > + if (t > word_mode_move_cost * factor) ">=" here too. > + /* The only case here to check to see if moving the upper part with a > + zero is cheaper than doing the zext itself. There is some theory > + that any well implemented port would just have the pattern. */ Don't really get this comment. These days, I don't think ports are really expected to define double-word patterns that simply decompose to the usual (target-independent) word-mode sequence. Ports used to do that because having a unified operation helped the earlier rtl optimisers, but those cases are now handled at the gimple level. These days it seems better to let optabs lower the thing immediately and allow the individual word ops (which weren't exposed at the gimple level) to get maximum optimisation. Maybe easiest to remove the second sentence rather than argue about it. > + if (insn_rtx_cost (pat, true) > word_mode_move_cost + move_zero_cost) ">=" :-) > @@ -173,7 +364,7 @@ find_pseudo_copy (rtx set) > if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs)) > return false; > > - if (GET_MODE_SIZE (GET_MODE (dest)) <= UNITS_PER_WORD) > + if (!move_modes_to_split[(int)GET_MODE (dest)]) > return false; Space after "(int)". > @@ -335,6 +526,11 @@ find_decomposable_subregs (rtx *px, void > bitmap_set_bit (decomposable_context, regno); > break; > case SIMPLE_MOVE: > + /* If this is marked as a simple move with a mode this > + large, it is because find_pseudo_copy returned false > + because this was not a mode we wanted to split. */ > + if (!move_modes_to_split[GET_MODE (x)]) I think you need the (int) here too. (Should have been caught by boostrap if so.) > @@ -668,7 +864,7 @@ resolve_simple_move (rtx set, rtx insn) > orig_mode = GET_MODE (dest); > > words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD; > - if (words <= 1) > + if (!move_modes_to_split[(int)GET_MODE (dest)]) > return insn; > > start_sequence (); Space after "(int)". > op = SET_SRC (set); > - if (GET_CODE (op) != ASHIFT > - && GET_CODE (op) != LSHIFTRT > - && GET_CODE (op) != ZERO_EXTEND) > + mode = GET_MODE (op); > + > + if (mode != twice_word_mode) > + return 0; > + > + switch (GET_CODE (op)) { > + case ASHIFT: > + if (splitting_ashift) > + break; > + return 0; > + > + case LSHIFTRT: > + if (splitting_lshiftrt) > + break; > + return 0; > + > + case ZERO_EXTEND: > + if (splitting_zext) > + break; > + return 0; With the boolean array change suggested above, we'd not bother with this bit and check the cached cost information: > @@ -961,23 +1178,26 @@ find_decomposable_shift_zext (rtx insn) > > if (GET_CODE (op) == ZERO_EXTEND) > { > - if (GET_MODE (op_operand) != word_mode > - || GET_MODE_BITSIZE (GET_MODE (op)) != 2 * BITS_PER_WORD) > + if (GET_MODE (op_operand) != word_mode) > return 0; > } ...here and: > else /* left or right shift */ > { > + int shift_val; > + > if (!CONST_INT_P (XEXP (op, 1)) > - || INTVAL (XEXP (op, 1)) < BITS_PER_WORD > - || GET_MODE_BITSIZE (GET_MODE (op_operand)) != 2 * BITS_PER_WORD) > + || INTVAL (XEXP (op, 1)) < BITS_PER_WORD) > + return 0; > + > + shift_val = INTVAL (XEXP (op, 1)); > + if (!profitable_shift_p (GET_CODE (op), shift_val)) > return 0; > + > + bitmap_set_bit (decomposable_context, REGNO (op_operand)); > } ...here (using the boolean array instead of profitable_shift_p). > @@ -1094,24 +1342,41 @@ decompose_multiword_subregs (void) > all the insns. */ > { > unsigned int i; > + bool useful_modes_seen = false; > > for (i = FIRST_PSEUDO_REGISTER; i < max; ++i) > { > - if (regno_reg_rtx[i] != NULL > - && GET_MODE_SIZE (GET_MODE (regno_reg_rtx[i])) > UNITS_PER_WORD) > - break; > + if (regno_reg_rtx[i] != NULL) > + { > + enum machine_mode mode = GET_MODE (regno_reg_rtx[i]); > + if (regno_reg_rtx[i] != NULL > + && (move_modes_to_split [mode] > + || (splitting_some_shifts && mode == twice_word_mode))) Here I think we should just check move_modes_to_split. We should not split a mode for which moves get more expensive, regardless of whether shifts are cheaper. lower-subreg is designed to help later passes produce better code. Splitting a shift when the associated move is more expensive is likely to encourage those passes to generate more expensive move sequences instead of cheaper ones. (Includes RA and reload.) Besides, I claim that it makes no sense to define a double-word shift pattern that is inherently more expensive than what optabs would generate without the double-word shift pattern. So I think the ">="s in the (modified) shift cost calculations should act as "=="s in practice (but should still be coded as ">="s for consistency). That is, we'll allow the shift to be split if the cost of doing so is the same as the original shift, but not otherwise. Or the short version: if splitting double-word moves is expensive, skip all the zext and shift stuff too. Looks good to me otherwise, but it's Ian's pass. Richard