Jeff Law wrote:
I was looking at a regression caused by having ira-reload utilize the
existing copy detection code in IRA rather than my own and stumbled
upon this...
Consider this insn prior to IRA:
(insn 72 56 126 8 j.c:744 (parallel [
(set (reg:SI 110)
(minus:SI (reg:SI 69 [ ew_u$parts$lsw ])
(reg:SI 68 [ ew_u$parts$lsw ])))
(clobber (reg:CC 17 flags))
]) 290 {*subsi_1} (expr_list:REG_DEAD (reg:SI 69 [
ew_u$parts$lsw ])
(expr_list:REG_DEAD (reg:SI 68 [ ew_u$parts$lsw ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))))
Which matches this pattern in the x86 backend:
(define_insn "*sub<mode>_1"
[(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
(minus:SWI
(match_operand:SWI 1 "nonimmediate_operand" "0,0")
(match_operand:SWI 2 "<general_operand>" "<r><i>,<r>m")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
"sub{<imodesuffix>}\t{%2, %0|%0, %2}"
[(set_attr "type" "alu")
(set_attr "mode" "<MODE>")])
Note carefully that the constraints require operands 0 and 1 to
match. Operand 2 is not tied to any other operand. In fact, if
operand 2 is tied to operand0, then we are guaranteed to generate a
reload and muck up the code pretty badly.
Now looking at the copies recorded by IRA we have:
cp0:a0(r95)<->a1(r70)@248:move
cp1:a4(r69)<->a6(r110)@178:constraint
cp2:a3(r68)<->a6(r110)@22:shuffle
cp3:a9(r66)<->a10(r92)@11:shuffle
cp4:a11(r79)<->a12(r93)@89:move
cp5:a1(r70)<->a14(r96)@114:constraint
cp6:a1(r70)<->a13(r97)@114:constraint
Note carefully cp2 which claims a shuffle-copy between r68 and r110.
ISTM that when trying to assign a hard reg to pseudo r110 that if r68
has a hard reg, but r69 does not, then pseudo r110 will show a cost
savings if it is allocated into the same hard reg as pseudo r68.
The problematic code is add_insn_allocno_copies:
{
extract_insn (insn);
for (i = 0; i < recog_data.n_operands; i++)
{
operand = recog_data.operand[i];
if (REG_SUBREG_P (operand)
&& find_reg_note (insn, REG_DEAD,
REG_P (operand)
? operand : SUBREG_REG (operand)) !=
NULL_RTX)
{
str = recog_data.constraints[i];
while (*str == ' ' || *str == '\t')
str++;
bound_p = false;
for (j = 0, commut_p = false; j < 2; j++, commut_p = true)
if ((dup = get_dup (i, commut_p)) != NULL_RTX
&& REG_SUBREG_P (dup)
&& process_regs_for_copy (operand, dup, true,
NULL_RTX, freq))
bound_p = true;
if (bound_p)
continue;
/* If an operand dies, prefer its hard register for the
output operands by decreasing the hard register cost
or creating the corresponding allocno copies. The
cost will not correspond to a real move insn cost, so
make the frequency smaller. */
process_reg_shuffles (operand, i, freq < 8 ? 1 : freq / 8);
}
}
With r68 dying and not bound to an another operand, we create a
reg-shuffle copy to encourage tying r68 to the output operand. Not good.
ISTM that if an output is already bound to some input that we should
not be recording a copy between an unbound dying input and the bound
output.
You can play with the attached (meaningless) testcase -O2 -m32 -fPIC.
You won't see code quality regressions due to this problem on this
testcase with the mainline sources, but it should be enough to trigger
the bogus copy.
Thoughts?
Yes, Jeff, you are probably right that we should make the shuffle copy
when there is already a constraint copy to the operand. There is no
regression on the test because shuffle copy has always the smaller
cost. I introduced the shuffle copies because they improved SPEC2000
rates for x86/x86_64 but I did not try to write a code to remove the
situation you found.
I think we should try to remove shuffle copy (or even discorage such
shuffle through negative copy cost) and see what happened to SPEC rates.