On 01/14/15 12:20, Robert Suchanek wrote:
Hi Vladimir,
An issue has been identified with LRA when running CPU2006 h264ref benchmark.
I'll try to describe what the issue is and a fix applied as it is very
difficult to reproduce it and it is next to impossible to create a narrowed
testcase on top of the source code restrictions.
The concerned LRA code in lra-constraints.c is the following:
if (GET_CODE (*loc) == SUBREG)
{
reg = SUBREG_REG (*loc);
byte = SUBREG_BYTE (*loc);
if (REG_P (reg)
/* Strict_low_part requires reload the register not
the sub-register. */
&& (curr_static_id->operand[i].strict_low
|| (GET_MODE_SIZE (mode)
<= GET_MODE_SIZE (GET_MODE (reg))
&& (hard_regno
= get_try_hard_regno (REGNO (reg))) >= 0
&& (simplify_subreg_regno
(hard_regno,
GET_MODE (reg), byte, mode) < 0)
&& (goal_alt[i] == NO_REGS
|| (simplify_subreg_regno
(ira_class_hard_regs[goal_alt[i]][0],
GET_MODE (reg), byte, mode) >= 0))))
{
loc = &SUBREG_REG (*loc);
mode = GET_MODE (*loc);
}
}
The above works just fine when we deal with strict_low_part or a subreg
smaller than a word.
However, multi-word operations that were emitted as a sequence of operations
on word sized parts of the DImode register appears to expose a problem with
LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
LRA does not realize that it actually uses the other halve of the DI-mode
register leading to a situation where it modifies one halve of the result and
spills the whole register with the other halve undefined.
In the dump I can see the following:
Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
1487: r1552:DI#4=r1404:SI+r1509:SI
REG_DEAD r1509:SI
REG_DEAD r1404:SI
Inserting insn reload after:
1735: r521:DI=r1552:DI
There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
to load the value before modifying it but it is spilled.
As it is a multi-word register, the split pass emits an additional instruction
to load the whole 64-bit value but since one halve was modified, only
register $20 appears in the live-in set. In contrast to $20, $21 is being used
but not added to the live-in set.
...
;; live in 4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
[$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
...
(insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
(mem/c:SI (plus:SI (reg/f:SI 29 $sp)
(const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
{*movsi_internal}
(nil))
(insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
(mem/c:SI (plus:SI (reg/f:SI 29 $sp)
(const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
{*movsi_internal}
(nil))
...
The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
to treat the pseudo register (r1552 in this case) as input and LRA will be
forced
to insert a reload before modifying its contents.
Handling of strict_low_part case is fine as the operand is described in the MD
pattern
as IN_OUT through modifiers.
With the above change in place, we get a reload before assignment:
Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
1487: r1552:DI#4=r1404:SI+r1509:SI
REG_DEAD r1509:SI
REG_DEAD r1404:SI
Inserting insn reload before:
1735: r1552:DI=r521:DI
Inserting insn reload after:
1736: r521:DI=r1552:DI
and the benchmark happily passes the runtime check.
The question is whether changing the type to OP_INOUT is the correct and
valid fix?
Regards,
Robert
2015-01-14 Robert Suchanek <robert.sucha...@imgtec.com>
gcc/
* lra-constraints.c (curr_insn_transform): Change the type of a reload
pseudo to OP_INOUT.
Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify
that the comment just before its return statement is effectively the
situation you're in.
There are certainly cases where a SUBREG needs to be treated as an
in-out operand. We walked through them eons ago when we were poking at
SSA for RTL. But the details have long since faded from memory.
jeff