On 2019-06-21 9:42 a.m., Richard Sandiford wrote:
SVE has a prefix instruction (MOVPRFX) that acts as a move but is
designed to be easily fusible with the following instruction. The SVE
port therefore has lots of patterns with constraints of the form:
A: operand 0: =w,?w
...
operand n: 0, w
where the first alternative is a single instruction and the second
alternative uses MOVPRFX.
Ideally we want operand n to be allocated to the same register as
operand 0 in this case.
add_insn_allocno_copies is the main IRA routine that deals with tied
operands. It is (rightly) very conservative, and only handles cases in
which we're confident about saving a full move. So for a pattern like:
B: operand 0: =w,w
...
operand n: 0,w
we don't (and shouldn't) assume that tying operands 0 and n would
save the cost of a move.
But in A, the second alternative has a ? marker, which makes it more
expensive than the first alternative by a full reload. So I think for
copy elision we should ignore the untied operand n in the second
alternative of A.
One approach would be to add '*' markers to each pattern and make
ira_get_dup_out_num honour them. But I think the rule applies on
first principles, so marking with '*' shouldn't be necessary.
I think direct approach is better. The modifiers were designed for the
old algorithms. Now I think their treatment prevent and will prevent
significant RA algorithm modifications. I found that when tried to
significantly change algorithm for calculation of register class costs
and register class preferences.
This patch instead makes ira_get_dup_out_num ignore expensive
alternatives if there are other alternatives that match exactly.
The cheapest way of doing that seemed to be to take expensive
alternatives out of consideration in ira_setup_alts, which provides
a bitmask of alternatives and has all the information available.
add_insn_allocno_copies is the only current user of ira_setup_alts,
so no other code should be affected.
2019-06-21 Richard Sandiford <richard.sandif...@arm.com>
gcc/
* ira.c (ira_setup_alts): If any valid alternatives have zero cost,
exclude any others that are disparaged or that are bound to need
a reload or spill.
(ira_get_dup_out_num): Expand comment.
The patch is ok for me.
Index: gcc/ira.c
===================================================================
--- gcc/ira.c 2019-06-21 14:34:09.455685354 +0100
+++ gcc/ira.c 2019-06-21 14:34:13.239653892 +0100
@@ -1787,7 +1787,9 @@ setup_prohibited_mode_move_regs (void)
/* Extract INSN and return the set of alternatives that we should consider.
This excludes any alternatives whose constraints are obviously impossible
to meet (e.g. because the constraint requires a constant and the operand
- is nonconstant). */
+ is nonconstant). It also excludes alternatives that are bound to need
+ a spill or reload, as long as we have other alternatives that match
+ exactly. */
alternative_mask
ira_setup_alts (rtx_insn *insn)
{
@@ -1800,6 +1802,7 @@ ira_setup_alts (rtx_insn *insn)
preprocess_constraints (insn);
alternative_mask preferred = get_preferred_alternatives (insn);
alternative_mask alts = 0;
+ alternative_mask exact_alts = 0;
/* Check that the hard reg set is enough for holding all
alternatives. It is hard to imagine the situation when the
assertion is wrong. */
@@ -1816,20 +1819,24 @@ ira_setup_alts (rtx_insn *insn)
{
for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
{
- if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
+ if (!TEST_BIT (preferred, nalt) || TEST_BIT (exact_alts, nalt))
continue;
const operand_alternative *op_alt
= &recog_op_alt[nalt * recog_data.n_operands];
+ int this_reject = 0;
for (nop = 0; nop < recog_data.n_operands; nop++)
{
int c, len;
+ this_reject += op_alt[nop].reject;
+
rtx op = recog_data.operand[nop];
p = op_alt[nop].constraint;
if (*p == 0 || *p == ',')
continue;
-
+
+ bool win_p = false;
do
switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
{
@@ -1847,7 +1854,14 @@ ira_setup_alts (rtx_insn *insn)
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
- goto op_success;
+ {
+ rtx other = recog_data.operand[c - '0'];
+ if (MEM_P (other)
+ ? rtx_equal_p (other, op)
+ : REG_P (op) || SUBREG_P (op))
+ goto op_success;
+ win_p = true;
+ }
break;
case 'g':
@@ -1861,7 +1875,11 @@ ira_setup_alts (rtx_insn *insn)
{
case CT_REGISTER:
if (reg_class_for_constraint (cn) != NO_REGS)
- goto op_success;
+ {
+ if (REG_P (op) || SUBREG_P (op))
+ goto op_success;
+ win_p = true;
+ }
break;
case CT_CONST_INT:
@@ -1872,9 +1890,14 @@ ira_setup_alts (rtx_insn *insn)
break;
case CT_ADDRESS:
+ goto op_success;
+
case CT_MEMORY:
case CT_SPECIAL_MEMORY:
- goto op_success;
+ if (MEM_P (op))
+ goto op_success;
+ win_p = true;
+ break;
case CT_FIXED_FORM:
if (constraint_satisfied_p (op, cn))
@@ -1885,12 +1908,22 @@ ira_setup_alts (rtx_insn *insn)
}
}
while (p += len, c);
- break;
+ if (!win_p)
+ break;
+ /* We can make the alternative match by spilling a register
+ to memory or loading something into a register. Count a
+ cost of one reload (the equivalent of the '?' constraint). */
+ this_reject += 6;
op_success:
;
}
+
if (nop >= recog_data.n_operands)
- alts |= ALTERNATIVE_BIT (nalt);
+ {
+ alts |= ALTERNATIVE_BIT (nalt);
+ if (this_reject == 0)
+ exact_alts |= ALTERNATIVE_BIT (nalt);
+ }
}
if (commutative < 0)
break;
@@ -1900,13 +1933,13 @@ ira_setup_alts (rtx_insn *insn)
if (curr_swapped)
break;
}
- return alts;
+ return exact_alts ? exact_alts : alts;
}
/* Return the number of the output non-early clobber operand which
should be the same in any case as operand with number OP_NUM (or
- negative value if there is no such operand). The function takes
- only really possible alternatives into consideration. */
+ negative value if there is no such operand). ALTS is the mask
+ of alternatives that we should consider. */
int
ira_get_dup_out_num (int op_num, alternative_mask alts)
{