This patch takes a different approach to fixing PR52543 than does the patch in

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html

This patch transforms the lower-subreg pass(es) from unconditionally splitting wide moves, zero extensions, and shifts, so that it now takes into account the target specific costs and only does the transformations if it is profitable.

Unconditional splitting is a problem that not only occurs on the AVR but is also a problem on the ARM NEON and my private port. Furthermore, it is a problem that is likely to occur on most modern larger machines since these machines are more likely to have fast instructions for moving things that are larger than word mode.

At compiler initialization time, each mode that is larger that a word mode is examined to determine if the cost of moving a value of that mode is less expensive that inserting the proper number of word sided moves. If it is cheaper to split it up, a bit is set to allow moves of that mode to be lowered.

A similar analysis is made for the zero extensions and shifts except that lower subreg had been (and is still limited to only breaking up these operations if the target size was twice the size of word mode.) Also, if the analysis determines that there are no profitable transformations, the pass exits quickly without doing any analysis.

It is quite likely that most ports will have to be adjusted after this patch is accepted. For instance, the analysis discovers that there are no profitable transformations to be performed on the x86-64. Since this is not my platform, I have no idea if these are the correct settings. But the pass uses the standard insn_rtx_cost interface and it is the port maintainers responsibility to not lie to the optimization passes so this extra work in stage one should be acceptable.

I do know from a private conversation with Richard Sandiford, that mips patches are likely forthcoming.

There is preprocessor controlled code that prints out the cost analysis. Only a summary of this can go in the subregs dump file because the analysis is called from backend_init_target and so the dump file is not available. But it is very useful to define LOG_COSTS when adjusting your port.

There is also preprocessor code that forces all of the lowering operations to marked as profitable. This is useful in debugging the new logic.

Both of these preprocessor symbols are documented at the top of the pass.

I have tested this on an x86_64 with both the force lowering on and off and neither cause any regressions as well as extensive testing on my port.

Ok to commit?

Kenny

2012-03-29 Kenneth Zadeck <zad...@naturalbridge.com>

* toplev.c (backend_init_target): Call initializer for lower-subreg pass.

* lower-subreg.c (move_modes_to_split, splitting_ashift, splitting_lshiftrt) splitting_zext, splitting_some_shifts, twice_word_mode, something_to_do,
    word_mode_move_cost, move_zero_cost): New static vars.
    (compute_move_cost, profitable_shift_p, init_lower_subreg): New
    functions.
(find_pseudo_copy, resolve_simple_move): Added code to only split based on costs.
    (find_decomposable_subregs): Added code to mark as decomposable
    moves that are not profitable.
    (find_decomposable_shift_zext): Added code to only decompose
    shifts and zext if profitable.
    (resolve_shift_zext): Added comment.
    (decompose_multiword_subregs): Dump list of profitable
    transformations.  Add code to skip non profitable transformations.

    *rtl.h(init_lower_subreg): Added declaration.


Index: toplev.c
===================================================================
--- toplev.c	(revision 185969)
+++ toplev.c	(working copy)
@@ -1660,6 +1660,7 @@ backend_init_target (void)
   /* rtx_cost is mode-dependent, so cached values need to be recomputed
      on a mode change.  */
   init_expmed ();
+  init_lower_subreg ();
 
   /* We may need to recompute regno_save_code[] and regno_restore_code[]
      after a mode change as well.  */
Index: lower-subreg.c
===================================================================
--- lower-subreg.c	(revision 185969)
+++ lower-subreg.c	(working copy)
@@ -52,10 +52,34 @@ DEF_VEC_P (bitmap);
 DEF_VEC_ALLOC_P (bitmap,heap);
 
 /* Decompose multi-word pseudo-registers into individual
-   pseudo-registers when possible.  This is possible when all the uses
-   of a multi-word register are via SUBREG, or are copies of the
-   register to another location.  Breaking apart the register permits
-   more CSE and permits better register allocation.  */
+   pseudo-registers when possible and profitable.  This is possible
+   when all the uses of a multi-word register are via SUBREG, or are
+   copies of the register to another location.  Breaking apart the
+   register permits more CSE and permits better register allocation.
+   This is profitable if the machine does not have move instructions
+   to do this.  
+
+   This pass only splits moves with modes are wider than word_mode and
+   ashifts, lshiftrt and zero_extensions with integer modes that are
+   twice the width of word_mode.  The latter could be generalized if
+   there was a need to do this, but the trend in architectures is to
+   not need this.  
+
+   There are two useful preprocessor defines for use by maintainers:  
+
+   #define LOG_COSTS
+
+   if you wish to see the actual cost estimates that are being used
+   for each mode wider than word mode and the cost estimates for zero
+   extension and the shifts.   This can be useful when port maintainers 
+   are tuning insn rtx costs.
+
+   #define FORCE_LOWERING
+
+   if you wish to test the pass with all the transformation forced on.
+   This can be useful for finding bugs in the transformations.
+
+   Both of these should not be enabled at the same time. */
 
 /* Bit N in this bitmap is set if regno N is used in a context in
    which we can decompose it.  */
@@ -71,12 +95,179 @@ static bitmap non_decomposable_context;
    avoid generating accesses to its subwords in integer modes.  */
 static bitmap subreg_context;
 
+/* This pass can transform 4 different operations: move, ashift,
+   lshiftrt, and zero_extend.  There is a boolean vector for move
+   splitting that is indexed by mode and is true for each mode that is
+   to have its copies split.  The other three operations are only done
+   for one mode so they are only controlled by a single boolean  .*/
+static bool move_modes_to_split[MAX_MACHINE_MODE];
+
+/* Other splitting operations to be done on the this platform.  */
+static bool splitting_ashift;
+static bool splitting_lshiftrt;
+static bool splitting_zext;
+/* The or of the previous three values.  */
+static bool splitting_some_shifts;
+
+/* The integer mode that is twice the width of word_mode.  */
+static enum machine_mode twice_word_mode;
+
 /* Bit N in the bitmap in element M of this array is set if there is a
    copy from reg M to reg N.  */
 static VEC(bitmap,heap) *reg_copy_graph;
 
-/* Return whether X is a simple object which we can take a word_mode
-   subreg of.  */
+static bool something_to_do;
+
+/* Precomputed cost values used to determine if lowering shift
+   operations is profitable.  */ 
+static int word_mode_move_cost;
+static int move_zero_cost;
+
+/* See what the move cost is.  */
+static int 
+compute_move_cost (enum machine_mode mode)
+{
+  rtx src = gen_rtx_REG (mode, FIRST_PSEUDO_REGISTER);
+  rtx trg = gen_rtx_REG (mode, FIRST_PSEUDO_REGISTER);
+  rtx pat = gen_rtx_SET (VOIDmode, trg, src);
+
+  return insn_rtx_cost (pat, true);
+}
+
+
+/* Return true if it is profitable to lower and shift by SHIFT_AMT.
+   CODE can be either ASHIFT or LSHIFTRT.  */
+static bool
+profitable_shift_p (enum rtx_code code, int shift_amt)
+{
+  rtx trg = gen_rtx_REG (twice_word_mode, FIRST_PSEUDO_REGISTER);
+  rtx src = gen_rtx_REG (twice_word_mode, FIRST_PSEUDO_REGISTER);
+  int word_mode_size = GET_MODE_BITSIZE (word_mode);
+  int wide_cost = insn_rtx_cost (gen_rtx_SET (VOIDmode, trg, 
+					      gen_rtx_fmt_ee (code, twice_word_mode, 
+							      src, GEN_INT (shift_amt))),
+				 true);
+#ifdef FORCE_LOWERING
+  return true;
+#endif
+
+#ifdef LOG_COSTS
+  fprintf (stderr, "shift code %d, shift amt %d, wide_cost %d\n", code, shift_amt, wide_cost);
+#endif
+  if (shift_amt == word_mode_size)
+    return wide_cost > word_mode_move_cost + move_zero_cost;
+  else
+    {
+      int narrow_cost;
+
+      trg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER);
+      src = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER);
+      narrow_cost = insn_rtx_cost (gen_rtx_SET (VOIDmode, trg, 
+						gen_rtx_fmt_ee (code, word_mode, 
+								src, 
+								GEN_INT (shift_amt - word_mode_size))),
+				   true);
+
+#ifdef LOG_COSTS
+      fprintf (stderr, "narrow_cost %d\n", narrow_cost);
+#endif
+      return wide_cost > narrow_cost + move_zero_cost;
+    }
+}
+
+
+/* Initialize pass once per execution.  This envolves determining
+   which operations on the machine are profitable.  If none are found,
+   then the pass just returns when called.  */
+
+void
+init_lower_subreg (void)
+{
+  int word_mode_size = GET_MODE_BITSIZE (word_mode);
+  rtx trg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER);
+  rtx pat;
+  unsigned int i;
+  int factor;
+
+  word_mode_move_cost = compute_move_cost (word_mode);
+  move_zero_cost = insn_rtx_cost (gen_rtx_SET (VOIDmode, trg, CONST0_RTX (word_mode)), true);
+
+#ifdef LOG_COSTS
+  fprintf (stderr, "word mode move cost %d\n", word_mode_move_cost);
+  fprintf (stderr, "move 0 cost %d\n", move_zero_cost);
+#endif
+  for (i = 0; i < MAX_MACHINE_MODE; i++) 
+    {
+      int t;
+      factor = GET_MODE_SIZE (i) / UNITS_PER_WORD;
+
+      if (factor <= 1) 
+	continue;
+
+      t = compute_move_cost ((enum machine_mode) i);
+
+#ifdef LOG_COSTS
+      fprintf (stderr, "mode %s, move cost %d, factor %d\n", mode_name[i], t, factor);
+#endif
+      if (t > word_mode_move_cost * factor)
+	{
+	  move_modes_to_split[i] = true;
+	  something_to_do = true;
+	}
+#ifdef FORCE_LOWERING
+      move_modes_to_split[i] = true;
+      something_to_do = true;
+#endif
+    }
+
+  /* For the moves and shifts, the only case that is checked is one
+     where the mode of the target is an integer mode twice the width
+     of the word_mode.  */
+
+  twice_word_mode = GET_MODE_WIDER_MODE (word_mode);
+
+  /* The only case here to check to see if moving the upper part with a
+     zero is cheaper than doing the zext itself.  There is some theory
+     that any well implemented port would just have the pattern.  */
+  pat = gen_rtx_SET (VOIDmode, trg, 
+		     gen_rtx_ZERO_EXTEND (twice_word_mode, gen_reg_rtx (word_mode)));
+  
+  if (insn_rtx_cost (pat, true) > word_mode_move_cost + move_zero_cost) 
+    {
+      splitting_zext = true;
+      splitting_some_shifts = true;
+      something_to_do = true;
+    }
+
+#ifdef FORCE_LOWERING
+  splitting_zext = true;
+  splitting_some_shifts = true;
+  something_to_do = true;
+#endif
+  /* For the shifts there are three shift amounts that need to be
+     checked: word_mode, word_mode + 1 and word_mode * 1.5.  The first
+     of these can be done without a shift.  The second and third case
+     are the same on large machines but may be different on small
+     machines that special case shift 1 and 2.  If any of these are
+     found to be useful, then we set the flags to consider those cases
+     and when the actual shift amount is known, redo the cost
+     calculation.  */
+  
+  splitting_ashift = profitable_shift_p (ASHIFT, word_mode_size)
+    || profitable_shift_p (ASHIFT, word_mode_size + 1)
+    || profitable_shift_p (ASHIFT, word_mode_size + (word_mode_size >> 1));
+
+  splitting_some_shifts |= splitting_ashift;
+  something_to_do |= splitting_ashift;
+
+  splitting_lshiftrt = profitable_shift_p (LSHIFTRT, word_mode_size)
+    || profitable_shift_p (LSHIFTRT, word_mode_size + 1)
+    || profitable_shift_p (LSHIFTRT, word_mode_size + (word_mode_size >> 1));
+  
+  splitting_some_shifts |= splitting_lshiftrt;
+  something_to_do |= splitting_lshiftrt;
+}
+
 
 static bool
 simple_move_operand (rtx x)
@@ -173,7 +364,7 @@ find_pseudo_copy (rtx set)
   if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))
     return false;
 
-  if (GET_MODE_SIZE (GET_MODE (dest)) <= UNITS_PER_WORD)
+  if (!move_modes_to_split[(int)GET_MODE (dest)])
     return false;
 
   b = VEC_index (bitmap, reg_copy_graph, rs);
@@ -335,6 +526,11 @@ find_decomposable_subregs (rtx *px, void
 		bitmap_set_bit (decomposable_context, regno);
 	      break;
 	    case SIMPLE_MOVE:
+	      /* If this is marked as a simple move with a mode this
+		 large, it is because find_pseudo_copy returned false
+		 because this was not a mode we wanted to split.  */
+	      if (!move_modes_to_split[GET_MODE (x)])
+		bitmap_set_bit (non_decomposable_context, regno);
 	      break;
 	    default:
 	      gcc_unreachable ();
@@ -668,7 +864,7 @@ resolve_simple_move (rtx set, rtx insn)
   orig_mode = GET_MODE (dest);
 
   words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
-  if (words <= 1)
+  if (!move_modes_to_split[(int)GET_MODE (dest)])
     return insn;
 
   start_sequence ();
@@ -941,16 +1137,37 @@ find_decomposable_shift_zext (rtx insn)
   rtx set;
   rtx op;
   rtx op_operand;
+  enum machine_mode mode;
 
   set = single_set (insn);
   if (!set)
     return 0;
 
   op = SET_SRC (set);
-  if (GET_CODE (op) != ASHIFT
-      && GET_CODE (op) != LSHIFTRT
-      && GET_CODE (op) != ZERO_EXTEND)
+  mode = GET_MODE (op);
+
+  if (mode != twice_word_mode)
+    return 0;
+
+  switch (GET_CODE (op)) {
+  case ASHIFT:
+    if (splitting_ashift)
+      break;
+    return 0;
+    
+  case LSHIFTRT:
+    if (splitting_lshiftrt)
+      break;
+    return 0;
+    
+  case ZERO_EXTEND:
+    if (splitting_zext)
+      break;
+    return 0;
+
+  default:
     return 0;
+  }
 
   op_operand = XEXP (op, 0);
   if (!REG_P (SET_DEST (set)) || !REG_P (op_operand)
@@ -961,23 +1178,26 @@ find_decomposable_shift_zext (rtx insn)
 
   if (GET_CODE (op) == ZERO_EXTEND)
     {
-      if (GET_MODE (op_operand) != word_mode
-	  || GET_MODE_BITSIZE (GET_MODE (op)) != 2 * BITS_PER_WORD)
+      if (GET_MODE (op_operand) != word_mode)
 	return 0;
     }
   else /* left or right shift */
     {
+      int shift_val;
+
       if (!CONST_INT_P (XEXP (op, 1))
-	  || INTVAL (XEXP (op, 1)) < BITS_PER_WORD
-	  || GET_MODE_BITSIZE (GET_MODE (op_operand)) != 2 * BITS_PER_WORD)
+	  || INTVAL (XEXP (op, 1)) < BITS_PER_WORD)
+	return 0;
+
+      shift_val = INTVAL (XEXP (op, 1));
+      if (!profitable_shift_p (GET_CODE (op), shift_val))
 	return 0;
+
+      bitmap_set_bit (decomposable_context, REGNO (op_operand));
     }
 
   bitmap_set_bit (decomposable_context, REGNO (SET_DEST (set)));
 
-  if (GET_CODE (op) != ZERO_EXTEND)
-    bitmap_set_bit (decomposable_context, REGNO (op_operand));
-
   return 1;
 }
 
@@ -1008,6 +1228,8 @@ resolve_shift_zext (rtx insn)
 
   op_operand = XEXP (op, 0);
 
+  /* We can tear this operation apart only if the regs were already
+     torn apart.  */ 
   if (!resolve_reg_p (SET_DEST (set)) && !resolve_reg_p (op_operand))
     return NULL_RTX;
 
@@ -1083,8 +1305,34 @@ decompose_multiword_subregs (void)
   unsigned int max;
   basic_block bb;
 
-  if (df)
-    df_set_flags (DF_DEFER_INSN_RESCAN);
+  if (dump_file)
+    {
+      unsigned int i;
+
+      for (i = 0; i < MAX_MACHINE_MODE; i++) 
+	{
+	  if (GET_MODE_SIZE (i) > UNITS_PER_WORD)
+	    fprintf (dump_file, "%s mode %s for copy lowering.\n", 
+		     move_modes_to_split[i] ? "Splitting" : "Skipping", mode_name[i]);
+	}
+
+      fprintf (dump_file, "%s mode %s for zero_extend lowering.\n", 
+	       splitting_zext ? "Splitting" : "Skipping", mode_name[twice_word_mode]);
+
+      fprintf (dump_file, "%s mode %s for ashift lowering.\n", 
+	       splitting_ashift ? "Splitting" : "Skipping", mode_name[twice_word_mode]);
+
+      fprintf (dump_file, "%s mode %s for lshiftrt lowering.\n", 
+	       splitting_lshiftrt ? "Splitting" : "Skipping", mode_name[twice_word_mode]);
+
+      if (!something_to_do)
+	fprintf (dump_file, "\nNothing to lower for port.\n\n");
+    }
+
+
+  /* Check if this target even has any modes to consider lowering.   */
+  if (!something_to_do)
+    return;
 
   max = max_reg_num ();
 
@@ -1094,24 +1342,41 @@ decompose_multiword_subregs (void)
      all the insns.  */
   {
     unsigned int i;
+    bool useful_modes_seen = false;
 
     for (i = FIRST_PSEUDO_REGISTER; i < max; ++i)
       {
-	if (regno_reg_rtx[i] != NULL
-	    && GET_MODE_SIZE (GET_MODE (regno_reg_rtx[i])) > UNITS_PER_WORD)
-	  break;
+	if (regno_reg_rtx[i] != NULL)
+	  {
+	    enum machine_mode mode = GET_MODE (regno_reg_rtx[i]);
+	    if (regno_reg_rtx[i] != NULL 
+		&& (move_modes_to_split [mode] 
+		    || (splitting_some_shifts && mode == twice_word_mode)))
+	      {
+		useful_modes_seen = true;
+		break;
+	      }
+	  }
+      }
+
+    if (!useful_modes_seen)
+      {
+	if (dump_file)
+	  fprintf (dump_file, "Nothing to lower in this function.\n");
+	return;
       }
-    if (i == max)
-      return;
   }
 
-  if (df)
-    run_word_dce ();
+  if (df) 
+    {
+      df_set_flags (DF_DEFER_INSN_RESCAN);
+      run_word_dce ();
+    }
 
-  /* FIXME: When the dataflow branch is merged, we can change this
-     code to look for each multi-word pseudo-register and to find each
-     insn which sets or uses that register.  That should be faster
-     than scanning all the insns.  */
+  /* FIXME: It may be possible to change this code to look for each
+     multi-word pseudo-register and to find each insn which sets or
+     uses that register.  That should be faster than scanning all the
+     insns.  */
 
   decomposable_context = BITMAP_ALLOC (NULL);
   non_decomposable_context = BITMAP_ALLOC (NULL);
Index: rtl.h
===================================================================
--- rtl.h	(revision 185969)
+++ rtl.h	(working copy)
@@ -2523,6 +2523,9 @@ extern void init_expmed (void);
 extern void expand_inc (rtx, rtx);
 extern void expand_dec (rtx, rtx);
 
+/* In lower-subreg.c */
+extern void init_lower_subreg (void);
+
 /* In gcse.c */
 extern bool can_copy_p (enum machine_mode);
 extern bool can_assign_to_reg_without_clobbers_p (rtx);

Reply via email to