On 01/14/15 12:20, Robert Suchanek wrote:
Hi Vladimir,

An issue has been identified with LRA when running CPU2006 h264ref benchmark.

I'll try to describe what the issue is and a fix applied as it is very
difficult to reproduce it and it is next to impossible to create a narrowed
testcase on top of the source code restrictions.

The concerned LRA code in lra-constraints.c is the following:

if (GET_CODE (*loc) == SUBREG)
   {
     reg = SUBREG_REG (*loc);
     byte = SUBREG_BYTE (*loc);
     if (REG_P (reg)
         /* Strict_low_part requires reload the register not
            the sub-register.  */
         && (curr_static_id->operand[i].strict_low
             || (GET_MODE_SIZE (mode)
                 <= GET_MODE_SIZE (GET_MODE (reg))
                 && (hard_regno
                     = get_try_hard_regno (REGNO (reg))) >= 0
                 && (simplify_subreg_regno
                     (hard_regno,
                      GET_MODE (reg), byte, mode) < 0)
                 && (goal_alt[i] == NO_REGS
                     || (simplify_subreg_regno
                         (ira_class_hard_regs[goal_alt[i]][0],
                          GET_MODE (reg), byte, mode) >= 0))))
       {
         loc = &SUBREG_REG (*loc);
         mode = GET_MODE (*loc);
       }
   }

The above works just fine when we deal with strict_low_part or a subreg
smaller than a word.

However, multi-word operations that were emitted as a sequence of operations
on word sized parts of the DImode register appears to expose a problem with
LRA e.g. '(set (subreg: SI (reg: DI)) ...)'.
LRA does not realize that it actually uses the other halve of the DI-mode
register leading to a situation where it modifies one halve of the result and
spills the whole register with the other halve undefined.

In the dump I can see the following:

       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
  1487: r1552:DI#4=r1404:SI+r1509:SI
       REG_DEAD r1509:SI
       REG_DEAD r1404:SI
     Inserting insn reload after:
  1735: r521:DI=r1552:DI

There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted
to load the value before modifying it but it is spilled.

As it is a multi-word register, the split pass emits an additional instruction
to load the whole 64-bit value but since one halve was modified, only
register $20 appears in the live-in set. In contrast to $20, $21 is being used
but not added to the live-in set.

...
;; live  in      4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13]
[$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25]
29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec]
...
(insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521])
     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
               (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288
{*movsi_internal}
      (nil))
(insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ])
     (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
               (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288
{*movsi_internal}
      (nil))
...

The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT
to treat the pseudo register (r1552 in this case) as input and LRA will be 
forced
to insert a reload before modifying its contents.

Handling of strict_low_part case is fine as the operand is described in the MD 
pattern
as IN_OUT through modifiers.

With the above change in place, we get a reload before assignment:

       Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552
  1487: r1552:DI#4=r1404:SI+r1509:SI
       REG_DEAD r1509:SI
       REG_DEAD r1404:SI
     Inserting insn reload before:
  1735: r1552:DI=r521:DI
     Inserting insn reload after:
  1736: r521:DI=r1552:DI

and the benchmark happily passes the runtime check.

The question is whether changing the type to OP_INOUT is the correct and
valid fix?

Regards,
Robert

2015-01-14  Robert Suchanek  <robert.sucha...@imgtec.com>

gcc/
         * lra-constraints.c (curr_insn_transform): Change the type of a reload
         pseudo to OP_INOUT.
Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify that the comment just before its return statement is effectively the situation you're in.

There are certainly cases where a SUBREG needs to be treated as an in-out operand. We walked through them eons ago when we were poking at SSA for RTL. But the details have long since faded from memory.

jeff


Reply via email to