On 2019-06-21 9:42 a.m., Richard Sandiford wrote:
SVE has a prefix instruction (MOVPRFX) that acts as a move but is
designed to be easily fusible with the following instruction.  The SVE
port therefore has lots of patterns with constraints of the form:

   A: operand 0: =w,?w
      ...
      operand n:  0, w

where the first alternative is a single instruction and the second
alternative uses MOVPRFX.

Ideally we want operand n to be allocated to the same register as
operand 0 in this case.

add_insn_allocno_copies is the main IRA routine that deals with tied
operands.  It is (rightly) very conservative, and only handles cases in
which we're confident about saving a full move.  So for a pattern like:

   B: operand 0: =w,w
      ...
      operand n:  0,w

we don't (and shouldn't) assume that tying operands 0 and n would
save the cost of a move.

But in A, the second alternative has a ? marker, which makes it more
expensive than the first alternative by a full reload.  So I think for
copy elision we should ignore the untied operand n in the second
alternative of A.

One approach would be to add '*' markers to each pattern and make
ira_get_dup_out_num honour them.  But I think the rule applies on
first principles, so marking with '*' shouldn't be necessary.
I think direct approach is better.  The modifiers were designed for the old algorithms.  Now I think their treatment prevent and will prevent significant RA algorithm modifications.  I found that when tried to significantly change algorithm for calculation of register class costs and register class preferences.
This patch instead makes ira_get_dup_out_num ignore expensive
alternatives if there are other alternatives that match exactly.
The cheapest way of doing that seemed to be to take expensive
alternatives out of consideration in ira_setup_alts, which provides
a bitmask of alternatives and has all the information available.
add_insn_allocno_copies is the only current user of ira_setup_alts,
so no other code should be affected.


2019-06-21  Richard Sandiford  <richard.sandif...@arm.com>

gcc/
        * ira.c (ira_setup_alts): If any valid alternatives have zero cost,
        exclude any others that are disparaged or that are bound to need
        a reload or spill.
        (ira_get_dup_out_num): Expand comment.
The patch is ok for me.
Index: gcc/ira.c
===================================================================
--- gcc/ira.c   2019-06-21 14:34:09.455685354 +0100
+++ gcc/ira.c   2019-06-21 14:34:13.239653892 +0100
@@ -1787,7 +1787,9 @@ setup_prohibited_mode_move_regs (void)
  /* Extract INSN and return the set of alternatives that we should consider.
     This excludes any alternatives whose constraints are obviously impossible
     to meet (e.g. because the constraint requires a constant and the operand
-   is nonconstant).  */
+   is nonconstant).  It also excludes alternatives that are bound to need
+   a spill or reload, as long as we have other alternatives that match
+   exactly.  */
  alternative_mask
  ira_setup_alts (rtx_insn *insn)
  {
@@ -1800,6 +1802,7 @@ ira_setup_alts (rtx_insn *insn)
    preprocess_constraints (insn);
    alternative_mask preferred = get_preferred_alternatives (insn);
    alternative_mask alts = 0;
+  alternative_mask exact_alts = 0;
    /* Check that the hard reg set is enough for holding all
       alternatives.  It is hard to imagine the situation when the
       assertion is wrong.  */
@@ -1816,20 +1819,24 @@ ira_setup_alts (rtx_insn *insn)
      {
        for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
        {
-         if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
+         if (!TEST_BIT (preferred, nalt) || TEST_BIT (exact_alts, nalt))
            continue;
const operand_alternative *op_alt
            = &recog_op_alt[nalt * recog_data.n_operands];
+         int this_reject = 0;
          for (nop = 0; nop < recog_data.n_operands; nop++)
            {
              int c, len;
+ this_reject += op_alt[nop].reject;
+
              rtx op = recog_data.operand[nop];
              p = op_alt[nop].constraint;
              if (*p == 0 || *p == ',')
                continue;
-       
+
+             bool win_p = false;
              do
                switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
                  {
@@ -1847,7 +1854,14 @@ ira_setup_alts (rtx_insn *insn)
case '0': case '1': case '2': case '3': case '4':
                  case '5':  case '6':  case '7':  case '8':  case '9':
-                   goto op_success;
+                   {
+                     rtx other = recog_data.operand[c - '0'];
+                     if (MEM_P (other)
+                         ? rtx_equal_p (other, op)
+                         : REG_P (op) || SUBREG_P (op))
+                       goto op_success;
+                     win_p = true;
+                   }
                    break;
                
                  case 'g':
@@ -1861,7 +1875,11 @@ ira_setup_alts (rtx_insn *insn)
                        {
                        case CT_REGISTER:
                          if (reg_class_for_constraint (cn) != NO_REGS)
-                           goto op_success;
+                           {
+                             if (REG_P (op) || SUBREG_P (op))
+                               goto op_success;
+                             win_p = true;
+                           }
                          break;
case CT_CONST_INT:
@@ -1872,9 +1890,14 @@ ira_setup_alts (rtx_insn *insn)
                          break;
case CT_ADDRESS:
+                         goto op_success;
+
                        case CT_MEMORY:
                        case CT_SPECIAL_MEMORY:
-                         goto op_success;
+                         if (MEM_P (op))
+                           goto op_success;
+                         win_p = true;
+                         break;
case CT_FIXED_FORM:
                          if (constraint_satisfied_p (op, cn))
@@ -1885,12 +1908,22 @@ ira_setup_alts (rtx_insn *insn)
                    }
                  }
              while (p += len, c);
-             break;
+             if (!win_p)
+               break;
+             /* We can make the alternative match by spilling a register
+                to memory or loading something into a register.  Count a
+                cost of one reload (the equivalent of the '?' constraint).  */
+             this_reject += 6;
            op_success:
              ;
            }
+
          if (nop >= recog_data.n_operands)
-           alts |= ALTERNATIVE_BIT (nalt);
+           {
+             alts |= ALTERNATIVE_BIT (nalt);
+             if (this_reject == 0)
+               exact_alts |= ALTERNATIVE_BIT (nalt);
+           }
        }
        if (commutative < 0)
        break;
@@ -1900,13 +1933,13 @@ ira_setup_alts (rtx_insn *insn)
        if (curr_swapped)
        break;
      }
-  return alts;
+  return exact_alts ? exact_alts : alts;
  }
/* Return the number of the output non-early clobber operand which
     should be the same in any case as operand with number OP_NUM (or
-   negative value if there is no such operand).  The function takes
-   only really possible alternatives into consideration.  */
+   negative value if there is no such operand).  ALTS is the mask
+   of alternatives that we should consider.  */
  int
  ira_get_dup_out_num (int op_num, alternative_mask alts)
  {

Reply via email to