IRA copy heuristics

Richard Sandiford Sun, 31 Aug 2008 13:26:48 -0700

As I mentioned in:

    http://gcc.gnu.org/ml/gcc/2008-08/msg00476.html


I'd been working on a MIPS IRA port, but got side-tracked by a wrong-code
regression.

The regression was caused by incorrect EH liveness information. I tried
to "fix" it by replacing the current note_stores-based forward scan with
a DF-based backwards scan, which gave us the corrected information for free.
However, even after Vlad fixed a pessimisation wrt calls and
DF_REF_MAY_CLOBBER, he saw a significant decrease in SPECFP performance.
It seemed to me (and I think to Vlad) that the differences were caused
by the "inverted" ALLOCNO_NUM ordering.  Before the patch, allocno
indexes typically increased as you went from the beginning of a region
to the end, whereas it was the other way around after the patch.

In other words, a lot of the difference seemed to be down to the
last line of bucket_allocno_compare_func:

static int
bucket_allocno_compare_func (const void *v1p, const void *v2p)
{
  ira_allocno_t a1 = *(const ira_allocno_t *) v1p;
  ira_allocno_t a2 = *(const ira_allocno_t *) v2p;
  int diff, a1_freq, a2_freq, a1_num, a2_num;

  if ((diff = (int) ALLOCNO_COVER_CLASS (a2) - ALLOCNO_COVER_CLASS (a1)) != 0)
    return diff;
  get_coalesced_allocnos_attributes (a1, &a1_freq, &a1_num);
  get_coalesced_allocnos_attributes (a2, &a2_freq, &a2_num);
  if ((diff = a2_num - a1_num) != 0)
    return diff;
  else if ((diff = a1_freq - a2_freq) != 0)
    return diff;
  return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1);
}

This line is really just a tie-breaker, but changing:

  return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1);

to:

  return ALLOCNO_NUM (a1) - ALLOCNO_NUM (a2);

has a significant effect on the results.  The question then
(and now) is: is that Just One Of Those Things?  I realise we're
dealing with an NP problem, and that we can't expect perfect results.
So is the influence of the allocno index ordering acceptable?

I wasn't sure either way, and found a simple case where, with the
reversed order, the copy costs were actually causing us to pick
the one register that would induce a copy reload.  Unfortunately,
that particular testcase no longer exhibits the problem, so I redid
the experiment on trunk and picked an arbitrary example of where
the reversed order was causing a difference.  (In this case,
the reversed order is better, because it avoids a copy.)

It turned out not to be quite the same problem that I'd seen before,
but the underlying questions about the copy costs are the same.
It also shows a potential problem with operand ties involving
hard registers.

I've attached a patch that replaces ALLOCNO_NUM comparisons with
a new macro, ALLOCNO_COMPARE.  Apply the patch as-is to swap the order.
Replace:

    #define ALLOCNO_COMPARE(A1, A2) (ALLOCNO_NUM (A2) - ALLOCNO_NUM (A1))

with:

    #define ALLOCNO_COMPARE(A1, A2) (ALLOCNO_NUM (A1) - ALLOCNO_NUM (A2))

to keep the current order.

I've also attached the DF patch, although you don't need it for what follows.

The testcase I picked was gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c,
compiled at -Os on x86_64-linux-gnu.  The rtl for the function is:

(insn:HI 6 3 7 2 ../../../../new/gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c:7 
(set (reg:DF 61)
        (mem/u/c/i:DF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S8 A64])) 103 
{*movdf_integer_rex64} (expr_list:REG_EQUAL (const_double:DF 5.0e+0 [0x0.ap+3])
        (nil)))

(insn:HI 7 6 8 2 ../../../../new/gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c:7 
(set (reg/v:DF 58 [ tmp2 ])
        (plus:DF (reg/v:DF 60 [ a ])
            (reg:DF 61))) 734 {*fop_df_comm_sse} (nil))

(insn:HI 8 7 9 2 ../../../../new/gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c:7 
(set (reg:DF 63)
        (minus:DF (reg/v:DF 58 [ tmp2 ])
            (reg/v:DF 60 [ a ]))) 745 {*fop_df_1_sse} (expr_list:REG_DEAD 
(reg/v:DF 58 [ tmp2 ])
        (nil)))

(insn:HI 9 8 11 2 ../../../../new/gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c:7 
(set (reg:DF 64)
        (plus:DF (reg/v:DF 60 [ a ])
            (reg:DF 63))) 734 {*fop_df_comm_sse} (expr_list:REG_DEAD (reg:DF 63)
        (expr_list:REG_DEAD (reg/v:DF 60 [ a ])
            (nil))))

(insn:HI 16 11 22 2 
../../../../new/gcc/testsuite/gcc.dg/tree-ssa/reassoc-13.c:10 (set (reg/i:DF 21 
xmm0)
        (minus:DF (reg:DF 64)
            (reg:DF 61))) 745 {*fop_df_1_sse} (expr_list:REG_DEAD (reg:DF 64)
        (expr_list:REG_DEAD (reg:DF 61)
            (nil))))

So let's use "Rx" to refer to pseudo register X, "Ax" to refer to
its allocno, and "Hx" to refer to its allocated hard register.
Allocnos are popped and allocated in the following orders:

    Original  Reversed
    --------  --------
    A60       A60
    A61       A61
    A58       A64
    A63       A63
    A64       A58

Both orders agree on the following assignments:

    H58 = xmm2
    H60 = xmm0
    H61 = xmm1
    H63 = xmm2

The problem is with the allocation of A64.  R64 is tied twice:

    - to R60 or R63 in insn 9
    - to xmm0 in insn 16

Both orderings set H60 to xmm0 before allocating A64, so setting
H64 to xmm0 would satisfy both ties without reloads.  The post-patch
compiler does this, but the pre-patch one doesn't.

The original order allocates A64 last of all, and since xmm0 is still
free at that point, the original order ought to have the best chance
of picking the "right" answer.  But the costs are weighted in favour
of xmm2 instead:

    Costs when allocating A64 in original order:
       xmm0 : -187
       xmm1 : 3000
       xmm2 : -750
       xmm3 : 3000
       ...  : 3000

So the original order picks xmm2 and generates a reload.

The main problem in this particular case is that insn 16 doesn't seem to
affect the relative cost of assigning xmm0 to A64.  The A64 costs are
only influenced by insn 9.

Even so, I was surprised that xmm2 was so heavily preferred over xmm0.
The copies for this function are:

    C1: insn 8: 58 -> 63, freq 1000
    C2: insn 9: 60 -> 64, freq 1000
    C3: insn 9: 63 -> 64, freq 1000

(C3 exists -- with the same frequency as C2 -- because PLUS is commutative.)
And the effect of these copies on the costs are as follows:

    - After assigning xmm0 to A60:
         C2: cost of xmm0 for A64 -= 3000
         -->C3: cost of xmm0 for A63 -= 750
            -->C3: cost of xmm0 for A64 -= 187 [*]
            -->C1: cost of xmm0 for A58 -= 187
               -->C1: cost of xmm0 for A63 -= 46 [*]

    - After assigning xmm2 to A58:
         C1: cost of xmm2 for A63 -= 3000
         -->C3: cost of xmm2 for A64 -= 750
            -->C3: cost of xmm2 for A63 -= 187 [*]

    - After assigning xmm2 to A63:
         C3: cost of xmm2 for A64 -= 3000

How standard are these copy-cost heuristics?  Are they ours, or are they
commonplace?

The adjustments marked [*] seem to be double-counting.  E.g. in the
first case, C2 first causes us to adjust the xmm0 cost of A64 by -3000.
We then recursively process all other copies involving A64 and adjust
the costs for the "other side" of those copies by a smaller amount.
So in this case, C3 causes us to adjust A63's xmm0 cost by -750.
But we then adjust the copies involving A63, and end up handling
C2 twice.

This might not matter much if it's done consistently, but I wasn't
sure whether it was deliberate.  (A comment might be nice if so.)

More fundamentally, C3 ends up having more influence than C2,
even though the copies ostensibly have the same weight.  In other words,
we have the following copy graph:

    A58 <-> A63 <-> A64 <-> A60

And the allocation of A63 has more influence over A64 than A60 does,
because of the transitive influence of A58.  But shouldn't this
transitive influence only kick in when A64 is allocated _between_
A58 and A63?  Why should the allocation of A58 matter to A64 if
A63 has already been allocated?

This is just a small example.  Couldn't we have a graph like:

    ... <--> A4 <-> A3 <-> A2 <-> A1

where, if things are allocated left-to-right, and a long chain of
registers including A4 is able to use the same hard register,
the combined effect of that chain actually has more influenece
over A2 than the choice of either A3 or A1?  I.e. couldn't we
have cases where the costs make A2 prefer H4 over H3, in cases
where H4 and H3 are different?

It took me quite a while to go through the decisions being made in this
case, and it was only towards the end that I realised it wasn't quite
the same situation I'd seen before.  (The situation I'd seen before
didn't involve a hard register tie like the one in insn 16.)  I can try
trawling through the other differences to find another example of the
original problem, but it was all down to the effects of transitive
copy costs, so I'd be interested to hear comments above whether the
above is reasonable behaviour or not.

If using DF seems like the Right Thing, we could simply apply both
patches, which would give a similar same allocno order to the one
we have now.  But it seemed better to look a bit deeper first...

Richard

Index: gcc/ira-build.c
===================================================================
--- gcc/ira-build.c	2008-08-31 21:00:08.000000000 +0100
+++ gcc/ira-build.c	2008-08-31 21:09:01.000000000 +0100
@@ -1059,7 +1059,7 @@ ira_swap_allocno_copy_ends_if_necessary 
   ira_allocno_t temp;
   ira_copy_t temp_cp;
 
-  if (ALLOCNO_NUM (cp->first) <= ALLOCNO_NUM (cp->second))
+  if (ALLOCNO_COMPARE (cp->first, cp->second) <= 0)
     return;
 
   temp = cp->first;
@@ -1867,7 +1867,7 @@ allocno_range_compare_func (const void *
     return diff;
   if ((diff = ALLOCNO_MAX (a1) - ALLOCNO_MAX (a2)) != 0)
      return diff;
-  return ALLOCNO_NUM (a1) - ALLOCNO_NUM (a2);
+  return ALLOCNO_COMPARE (a1, a2);
 }
 
 /* Sort ira_conflict_id_allocno_map and set up conflict id of
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c	2008-08-31 21:00:01.000000000 +0100
+++ gcc/ira-color.c	2008-08-31 21:09:01.000000000 +0100
@@ -209,7 +209,7 @@ allocno_cost_compare_func (const void *v
 
   /* If regs are equally good, sort by allocno numbers, so that the
      results of qsort leave nothing to chance.  */
-  return ALLOCNO_NUM (p1) - ALLOCNO_NUM (p2);
+  return ALLOCNO_COMPARE (p1, p2);
 }
 
 /* Print all allocnos coalesced with ALLOCNO.  */
@@ -549,7 +549,7 @@ bucket_allocno_compare_func (const void 
     return diff;
   else if ((diff = a1_freq - a2_freq) != 0)
     return diff;
-  return ALLOCNO_NUM (a2) - ALLOCNO_NUM (a1);
+  return ALLOCNO_COMPARE (a2, a1);
 }
 
 /* Sort bucket *BUCKET_PTR and return the result through
@@ -884,7 +884,7 @@ allocno_spill_priority_compare (splay_tr
     return diff;
   if ((diff = IRA_ALLOCNO_TEMP (a1) - IRA_ALLOCNO_TEMP (a2)) != 0)
     return diff;
-  return ALLOCNO_NUM (a1) - ALLOCNO_NUM (a2);
+  return ALLOCNO_COMPARE (a1, a2);
 }
 
 /* Allocate data of SIZE for the splay trees.  We allocate only spay
@@ -1076,8 +1076,8 @@ push_allocnos_to_stack (void)
 		      || (allocno_pri == i_allocno_pri
 			  && (allocno_cost > i_allocno_cost
 			      || (allocno_cost == i_allocno_cost 
-				  && (ALLOCNO_NUM (allocno)
-				      > ALLOCNO_NUM (i_allocno))))))
+				  && (ALLOCNO_COMPARE (allocno, i_allocno)
+				      > 0)))))
 		    {
 		      allocno = i_allocno;
 		      allocno_cost = i_allocno_cost;
@@ -1966,7 +1966,7 @@ allocno_priority_compare_func (const voi
 
   /* If regs are equally good, sort by allocnos, so that the results of
      qsort leave nothing to chance.  */
-  return ALLOCNO_NUM (a1) - ALLOCNO_NUM (a2);
+  return ALLOCNO_COMPARE (a1, a2);
 }
 
 /* Try to assign hard registers to the unassigned allocnos and
Index: gcc/ira-int.h
===================================================================
--- gcc/ira-int.h	2008-08-31 21:00:01.000000000 +0100
+++ gcc/ira-int.h	2008-08-31 21:09:01.000000000 +0100
@@ -464,6 +464,8 @@ #define ALLOCNO_MIN(A) ((A)->min)
 #define ALLOCNO_MAX(A) ((A)->max)
 #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id)
 
+#define ALLOCNO_COMPARE(A1, A2) (ALLOCNO_NUM (A2) - ALLOCNO_NUM (A1))
+
 /* Map regno -> allocnos with given regno (see comments for 
    allocno member `next_regno_allocno').  */
 extern ira_allocno_t *ira_regno_allocno_map;

Index: gcc/ira-build.c
===================================================================
--- gcc/ira-build.c	2008-08-31 21:00:01.000000000 +0100
+++ gcc/ira-build.c	2008-08-31 21:00:08.000000000 +0100
@@ -310,7 +310,7 @@ form_loop_tree (void)
 	 ira_loop_nodes[i].children = NULL;
 	 ira_loop_nodes[i].subloops = NULL;
        }
-  FOR_EACH_BB_REVERSE (bb)
+  FOR_EACH_BB (bb)
     {
       bb_node = &ira_bb_nodes[bb->index];
       bb_node->bb = bb;
@@ -1345,7 +1345,7 @@ create_bb_allocnos (ira_loop_tree_node_t
 
   curr_bb = bb = bb_node->bb;
   ira_assert (bb != NULL);
-  FOR_BB_INSNS (bb, insn)
+  FOR_BB_INSNS_REVERSE (bb, insn)
     if (INSN_P (insn))
       create_insn_allocnos (PATTERN (insn), false);
   /* It might be a allocno living through from one subloop to
Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	2008-08-31 17:28:05.000000000 +0100
+++ gcc/ira-lives.c	2008-08-31 21:07:48.000000000 +0100
@@ -149,15 +149,6 @@ make_regno_dead (int regno)
   update_allocno_pressure_excess_length (a);
 }
 
-/* Process the birth and, right after then, death of register
-   REGNO.  */
-static void
-make_regno_born_and_dead (int regno)
-{
-  make_regno_born (regno);
-  make_regno_dead (regno);
-}
-
 /* The current register pressures for each cover class for the current
    basic block.  */
 static int curr_reg_pressure[N_REG_CLASSES];
@@ -218,40 +209,21 @@ clear_allocno_live (ira_allocno_t a)
   sparseset_clear_bit (allocnos_live, ALLOCNO_NUM (a));
 }
 
-/* Record all regs that are set in any one insn.  Communication from
-   mark_reg_{store,clobber}.  */
-static VEC(rtx, heap) *regs_set;
-
-/* Handle the case where REG is set by the insn being scanned, during
-   the scan to build live ranges and calculate reg pressure info.
+/* Mark the register referenced by use or def REF as live
    Store a 1 in hard_regs_live or allocnos_live for this register or
    the corresponding allocno, record how many consecutive hardware
-   registers it actually needs.
+   registers it actually needs.  */
 
-   Note that even if REG does not remain alive after this insn, we
-   must mark it here as live, to ensure a conflict between REG and any
-   other reg allocnos set in this insn that really do live.  This is
-   because those other allocnos could be considered after this.
-
-   REG might actually be something other than a register; if so, we do
-   nothing.
-
-   SETTER is 0 if this register was modified by an auto-increment
-   (i.e., a REG_INC note was found for it).  */
 static void
-mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
-		void *data ATTRIBUTE_UNUSED)
+mark_ref_live (struct df_ref *ref)
 {
+  rtx reg;
   int regno;
 
+  reg = DF_REF_REG (ref);
   if (GET_CODE (reg) == SUBREG)
     reg = SUBREG_REG (reg);
-
-  if (! REG_P (reg))
-    return;
-
-  VEC_safe_push (rtx, heap, regs_set, reg);
-
+  gcc_assert (REG_P (reg));
   regno = REGNO (reg);
 
   if (regno >= FIRST_PSEUDO_REGISTER)
@@ -297,55 +269,36 @@ mark_reg_store (rtx reg, const_rtx sette
     }
 }
 
-/* Like mark_reg_store except notice just CLOBBERs; ignore SETs.  */
-static void
-mark_reg_clobber (rtx reg, const_rtx setter, void *data)
+/* Return true if the definition described by DEF conflicts with the
+   instruction's inputs.  */
+static bool
+def_conflicts_with_inputs_p (struct df_ref *def)
 {
-  if (GET_CODE (setter) == CLOBBER)
-    mark_reg_store (reg, setter, data);
+  /* Conservatively assume that the condition is true for all clobbers.  */
+  return DF_REF_FLAGS_IS_SET (def, DF_REF_MUST_CLOBBER);
 }
 
-/* Record that hard register REG (if it is a hard register) has
-   conflicts with all the allocno currently live or the corresponding
-   allocno lives at just the current program point.  Do not mark REG
-   (or the allocno) itself as live.  */
+/* Mark the register referenced by definition DEF as dead, if the
+   definition is a total one.  Store a 0 in hard_regs_live or
+   allocnos_live for the register.  */
 static void
-mark_reg_conflicts (rtx reg)
+mark_ref_dead (struct df_ref *def)
 {
+  unsigned int i;
+  rtx reg;
   int regno;
 
-  if (GET_CODE (reg) == SUBREG)
-    reg = SUBREG_REG (reg);
-
-  if (! REG_P (reg))
+  if (DF_REF_FLAGS_IS_SET (def, DF_REF_PARTIAL)
+      || DF_REF_FLAGS_IS_SET (def, DF_REF_CONDITIONAL))
     return;
 
+  reg = DF_REF_REG (def);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  gcc_assert (REG_P (reg));
   regno = REGNO (reg);
 
   if (regno >= FIRST_PSEUDO_REGISTER)
-    make_regno_born_and_dead (regno);
-  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno))
-    {
-      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
-
-      while (regno < last)
-	{
-	  make_regno_born_and_dead (regno);
-	  regno++;
-	}
-    }
-}
-
-/* Mark REG (or the corresponding allocno) as being dead (following
-   the insn being scanned now).  Store a 0 in hard_regs_live or
-   allocnos_live for the register.  */
-static void
-mark_reg_death (rtx reg)
-{
-  unsigned int i;
-  int regno = REGNO (reg);
-
-  if (regno >= FIRST_PSEUDO_REGISTER)
     {
       ira_allocno_t a = ira_curr_regno_allocno_map[regno];
 
@@ -618,14 +571,14 @@ process_single_reg_class_operands (bool 
 static void
 process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 {
-  int i;
+  int i, freq;
   unsigned int j;
   basic_block bb;
   rtx insn;
   edge e;
   edge_iterator ei;
   bitmap_iterator bi;
-  bitmap reg_live_in;
+  bitmap reg_live_out;
   unsigned int px;
 
   bb = loop_tree_node->bb;
@@ -637,9 +590,9 @@ process_bb_node_lives (ira_loop_tree_nod
 	  high_pressure_start_point[ira_reg_class_cover[i]] = -1;
 	}
       curr_bb_node = loop_tree_node;
-      reg_live_in = DF_LR_IN (bb);
+      reg_live_out = DF_LR_OUT (bb);
       sparseset_clear (allocnos_live);
-      REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_in);
+      REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
       AND_COMPL_HARD_REG_SET (hard_regs_live, eliminable_regset);
       AND_COMPL_HARD_REG_SET (hard_regs_live, ira_no_alloc_regs);
       for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
@@ -659,7 +612,7 @@ process_bb_node_lives (ira_loop_tree_nod
 	    ira_assert (curr_reg_pressure[cover_class]
 			<= ira_available_class_regs[cover_class]);
 	  }
-      EXECUTE_IF_SET_IN_BITMAP (reg_live_in, FIRST_PSEUDO_REGISTER, j, bi)
+      EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
 	{
 	  ira_allocno_t a = ira_curr_regno_allocno_map[j];
 	  
@@ -670,87 +623,91 @@ process_bb_node_lives (ira_loop_tree_nod
 	  make_regno_born (j);
 	}
       
-#ifdef EH_RETURN_DATA_REGNO
-      if (bb_has_eh_pred (bb))
-	{
-	  for (j = 0; ; ++j)
-	    {
-	      unsigned int regno = EH_RETURN_DATA_REGNO (j);
-	      
-	      if (regno == INVALID_REGNUM)
-		break;
-	      make_regno_born_and_dead (regno);
-	    }
-	}
-#endif
-      
-      /* Allocnos can't go in stack regs at the start of a basic block
-	 that is reached by an abnormal edge. Likewise for call
-	 clobbered regs, because caller-save, fixup_abnormal_edges and
-	 possibly the table driven EH machinery are not quite ready to
-	 handle such allocnos live across such edges.  */
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (e->flags & EDGE_ABNORMAL)
-	  break;
-      
-      if (e != NULL)
-	{
-#ifdef STACK_REGS
-	  EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, px)
-	    {
-	      ALLOCNO_NO_STACK_REG_P (ira_allocnos[px]) = true;
-	      ALLOCNO_TOTAL_NO_STACK_REG_P (ira_allocnos[px]) = true;
-	    }
-	  for (px = FIRST_STACK_REG; px <= LAST_STACK_REG; px++)
-	    make_regno_born_and_dead (px);
-#endif
-	  /* No need to record conflicts for call clobbered regs if we
-	     have nonlocal labels around, as we don't ever try to
-	     allocate such regs in this case.  */
-	  if (!cfun->has_nonlocal_label)
-	    for (px = 0; px < FIRST_PSEUDO_REGISTER; px++)
-	      if (call_used_regs[px])
-		make_regno_born_and_dead (px);
-	}
-  
+      freq = REG_FREQ_FROM_BB (bb);
+      if (freq == 0)
+	freq = 1;
+
       /* Scan the code of this basic block, noting which allocnos and
-	 hard regs are born or die.  */
-      FOR_BB_INSNS (bb, insn)
+	 hard regs are born or die.
+
+	 Note that this loop treats uninitialized values as live until
+	 the beginning of the block.  For example, if an instruction
+	 uses (reg:DI foo), and only (subreg:SI (reg:DI foo) 0) is ever
+	 set, FOO will remain live until the beginning of the block.
+	 Likewise if FOO is not set at all.  This is unnecessarily
+	 pessimistic, but it probably doesn't matter much in practice.  */
+      FOR_BB_INSNS_REVERSE (bb, insn)
 	{
-	  rtx link;
-	  int freq;
+	  struct df_ref **def_rec, **use_rec;
+	  bool call_p;
 	  
 	  if (! INSN_P (insn))
 	    continue;
 	  
-	  freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
-	  if (freq == 0)
-	    freq = 1;
-
 	  if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
 	    fprintf (ira_dump_file, "   Insn %u(l%d): point = %d\n",
 		     INSN_UID (insn), loop_tree_node->parent->loop->num,
 		     curr_point);
 
-	  /* Check regs_set is an empty set.  */
-	  gcc_assert (VEC_empty (rtx, regs_set));
-      
-	  /* Mark any allocnos clobbered by INSN as live, so they
-	     conflict with the inputs.  */
-	  note_stores (PATTERN (insn), mark_reg_clobber, NULL);
+	  /* Mark each defined value as live.  We need to do this for
+	     unused values because they still conflict with quantities
+	     that are live at the time of the definition.
+
+	     Ignore DF_REF_MAY_CLOBBERs on a call instruction.  Such
+	     references represent the effect of the called function
+	     on a call-clobbered register.  Marking the register as
+	     live would stop us from allocating it to a call-crossing
+	     allocno.  */
+	  call_p = CALL_P (insn);
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    if (!call_p || !DF_REF_FLAGS_IS_SET (*def_rec, DF_REF_MAY_CLOBBER))
+	      mark_ref_live (*def_rec);
+
+	  /* If INSN has multiple outputs, then any value used in one
+	     of the outputs conflicts with the other outputs.  Model this
+	     by making the used value live during the output phase.
+
+	     It is unsafe to use !single_set here since it will ignore
+	     an unused output.  Just because an output is unused does
+	     not mean the compiler can assume the side effect will not
+	     occur.  Consider if ALLOCNO appears in the address of an
+	     output and we reload the output.  If we allocate ALLOCNO
+	     to the same hard register as an unused output we could
+	     set the hard register before the output reload insn.  */
+	  if (GET_CODE (PATTERN (insn)) == PARALLEL && multiple_sets (insn))
+	    for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+	      {
+		int i;
+		rtx reg;
+
+		reg = DF_REF_REG (*use_rec);
+		for (i = XVECLEN (PATTERN (insn), 0) - 1; i >= 0; i--)
+		  {
+		    rtx set;
+
+		    set = XVECEXP (PATTERN (insn), 0, i);
+		    if (GET_CODE (set) == SET
+			&& reg_overlap_mentioned_p (reg, SET_DEST (set)))
+		      {
+			/* After the previous loop, this is a no-op if
+			   REG is contained within SET_DEST (SET).  */
+			mark_ref_live (*use_rec);
+			break;
+		      }
+		  }
+	      }
 	  
 	  extract_insn (insn);
-	  process_single_reg_class_operands (true, freq);
-	  
-	  /* Mark any allocnos dead after INSN as dead now.  */
-	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
-	    if (REG_NOTE_KIND (link) == REG_DEAD)
-	      mark_reg_death (XEXP (link, 0));
+	  process_single_reg_class_operands (false, freq);
 	  
-	  curr_point++;
+	  /* See which defined values die here.  */
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    if (!call_p || !DF_REF_FLAGS_IS_SET (*def_rec, DF_REF_MAY_CLOBBER))
+	      mark_ref_dead (*def_rec);
 
-	  if (CALL_P (insn))
+	  if (call_p)
 	    {
+	      /* The current set of live allocnos are live across the call.  */
 	      EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, i)
 	        {
 		  ira_allocno_t a = ira_allocnos[i];
@@ -767,67 +724,62 @@ process_bb_node_lives (ira_loop_tree_nod
 		}
 	    }
 	  
-	  /* Mark any allocnos set in INSN as live.  Clobbers are
-	     processed again, so they will conflict with the reg
-	     allocnos that are set.  */
-	  note_stores (PATTERN (insn), mark_reg_store, NULL);
-	  
-#ifdef AUTO_INC_DEC
-	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
-	    if (REG_NOTE_KIND (link) == REG_INC)
-	      mark_reg_store (XEXP (link, 0), NULL_RTX, NULL);
-#endif
-	  
-	  /* If INSN has multiple outputs, then any allocno that dies
-	     here and is used inside of an output must conflict with
-	     the other outputs.
-	     
-	     It is unsafe to use !single_set here since it will ignore
-	     an unused output.  Just because an output is unused does
-	     not mean the compiler can assume the side effect will not
-	     occur.  Consider if ALLOCNO appears in the address of an
-	     output and we reload the output.  If we allocate ALLOCNO
-	     to the same hard register as an unused output we could
-	     set the hard register before the output reload insn.  */
-	  if (GET_CODE (PATTERN (insn)) == PARALLEL && multiple_sets (insn))
-	    for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
-	      if (REG_NOTE_KIND (link) == REG_DEAD)
-		{
-		  int i;
-		  int used_in_output = 0;
-		  rtx reg = XEXP (link, 0);
-		  
-		  for (i = XVECLEN (PATTERN (insn), 0) - 1; i >= 0; i--)
-		    {
-		      rtx set = XVECEXP (PATTERN (insn), 0, i);
-		      
-		      if (GET_CODE (set) == SET
-			  && ! REG_P (SET_DEST (set))
-			  && ! rtx_equal_p (reg, SET_DEST (set))
-			  && reg_overlap_mentioned_p (reg, SET_DEST (set)))
-			used_in_output = 1;
-		    }
-		  if (used_in_output)
-		    mark_reg_conflicts (reg);
-		}
-	  
-	  process_single_reg_class_operands (false, freq);
+	  curr_point++;
+
+	  /* Mark each used value as live.  */
+	  for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+	    mark_ref_live (*use_rec);
+
+	  /* If any defined values conflict with the inputs, mark those
+	     defined values as live.  */
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    if (def_conflicts_with_inputs_p (*def_rec))
+	      mark_ref_live (*def_rec);
+
+	  process_single_reg_class_operands (true, freq);
 	  
-	  /* Mark any allocnos set in INSN and then never used.  */
-	  while (! VEC_empty (rtx, regs_set))
-	    {
-	      rtx reg = VEC_pop (rtx, regs_set);
-	      rtx note = find_regno_note (insn, REG_UNUSED, REGNO (reg));
+	  /* See which of the defined values we marked as live are dead
+	     before the instruction.  */
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    if (def_conflicts_with_inputs_p (*def_rec))
+	      mark_ref_dead (*def_rec);
 
-	      if (note)
-		mark_reg_death (XEXP (note, 0));
-	    }
 	  curr_point++;
 	}
+
+      /* Allocnos can't go in stack regs at the start of a basic block
+	 that is reached by an abnormal edge. Likewise for call
+	 clobbered regs, because caller-save, fixup_abnormal_edges and
+	 possibly the table driven EH machinery are not quite ready to
+	 handle such allocnos live across such edges.  */
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	if (e->flags & EDGE_ABNORMAL)
+	  break;
+
+      if (e != NULL)
+	{
+#ifdef STACK_REGS
+	  EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, px)
+	    {
+	      ALLOCNO_NO_STACK_REG_P (ira_allocnos[px]) = true;
+	      ALLOCNO_TOTAL_NO_STACK_REG_P (ira_allocnos[px]) = true;
+	    }
+	  for (px = FIRST_STACK_REG; px <= LAST_STACK_REG; px++)
+	    make_regno_born (px);
+#endif
+	  /* No need to record conflicts for call clobbered regs if we
+	     have nonlocal labels around, as we don't ever try to
+	     allocate such regs in this case.  */
+	  if (!cfun->has_nonlocal_label)
+	    for (px = 0; px < FIRST_PSEUDO_REGISTER; px++)
+	      if (call_used_regs[px])
+		make_regno_born (px);
+	}
+
       EXECUTE_IF_SET_IN_SPARSESET (allocnos_live, i)
-       {
-	 make_regno_dead (ALLOCNO_REGNO (ira_allocnos[i]));
-       }
+	{
+	  make_regno_dead (ALLOCNO_REGNO (ira_allocnos[i]));
+	}
 
       curr_point++;
 
@@ -944,9 +896,6 @@ ira_debug_live_ranges (void)
 ira_create_allocno_live_ranges (void)
 {
   allocnos_live = sparseset_alloc (ira_allocnos_num);
-  /* Make a vector that mark_reg_{store,clobber} will store in.  */
-  if (!regs_set)
-    regs_set = VEC_alloc (rtx, heap, 10);
   curr_point = 0;
   ira_traverse_loop_tree (true, ira_loop_tree_root, NULL,
 			  process_bb_node_lives);

IRA copy heuristics

Reply via email to