Hi Vladimir,

An issue has been identified with LRA when running CPU2006 h264ref benchmark.

I'll try to describe what the issue is and a fix applied as it is very
difficult to reproduce it and it is next to impossible to create a narrowed
testcase on top of the source code restrictions.

The concerned LRA code in lra-constraints.c is the following:

if (GET_CODE (*loc) == SUBREG)
  {
    reg = SUBREG_REG (*loc);
    byte = SUBREG_BYTE (*loc);
    if (REG_P (reg)
        /* Strict_low_part requires reload the register not
           the sub-register.  */
        && (curr_static_id->operand[i].strict_low
            || (GET_MODE_SIZE (mode)
                <= GET_MODE_SIZE (GET_MODE (reg))
                && (hard_regno
                    = get_try_hard_regno (REGNO (reg))) >= 0
                && (simplify_subreg_regno
                    (hard_regno,
                     GET_MODE (reg), byte, mode) < 0)
                && (goal_alt[i] == NO_REGS
                    || (simplify_subreg_regno
                        (ira_class_hard_regs[goal_alt[i]][0],
                         GET_MODE (reg), byte, mode) >= 0))))
      {
        loc = &SUBREG_REG (*loc);
        mode = GET_MODE (*loc);
      }
  }

The above works just fine when we deal with strict_low_part or a subreg
smaller than a word.

However, multi-word operations that were emitted as a sequence of operations
on word sized parts of the DImode register appears to expose a problem with
LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
LRA does not realize that it actually uses the other halve of the DI-mode
register leading to a situation where it modifies one halve of the result and
spills the whole register with the other halve undefined.

In the dump I can see the following:

      Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
 1487: r1552:DI#4=r1404:SI+r1509:SI
      REG_DEAD r1509:SI
      REG_DEAD r1404:SI
    Inserting insn reload after:
 1735: r521:DI=r1552:DI

There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
to load the value before modifying it but it is spilled.

As it is a multi-word register, the split pass emits an additional instruction
to load the whole 64-bit value but since one halve was modified, only
register $20 appears in the live-in set. In contrast to $20, $21 is being used
but not added to the live-in set.

...
;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
[$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
...
(insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
    (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
              (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
{*movsi_internal}
     (nil))
(insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
    (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
              (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
{*movsi_internal}
     (nil))
...

The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
to treat the pseudo register (r1552 in this case) as input and LRA will be 
forced
to insert a reload before modifying its contents. 

Handling of strict_low_part case is fine as the operand is described in the MD 
pattern
as IN_OUT through modifiers.

With the above change in place, we get a reload before assignment:

      Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
 1487: r1552:DI#4=r1404:SI+r1509:SI
      REG_DEAD r1509:SI
      REG_DEAD r1404:SI
    Inserting insn reload before:
 1735: r1552:DI=r521:DI
    Inserting insn reload after:
 1736: r521:DI=r1552:DI

and the benchmark happily passes the runtime check.

The question is whether changing the type to OP_INOUT is the correct and
valid fix?

Regards,
Robert

2015-01-14  Robert Suchanek  <robert.sucha...@imgtec.com>

gcc/
        * lra-constraints.c (curr_insn_transform): Change the type of a reload
        pseudo to OP_INOUT.
---
 gcc/lra-constraints.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ec28b7f..018968b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3798,6 +3798,7 @@ curr_insn_transform (void)
                                  (ira_class_hard_regs[goal_alt[i]][0],
                                   GET_MODE (reg), byte, mode) >= 0)))))
                {
+                 type = OP_INOUT;
                  loc = &SUBREG_REG (*loc);
                  mode = GET_MODE (*loc);
                }
--
1.7.9.5

Reply via email to