Vladimir Makarov wrote:
On 12/4/2013, 6:15 AM, Tejas Belagod wrote:
Hi,

I'm trying to relax CANNOT_CHANGE_MODE_CLASS for aarch64 to allow all
mode changes on FP_REGS as aarch64 does not have register-packing, but
I'm running into an LRA ICE. A test case generates an RTL subreg of the
following form

        (set (reg:DF 97) (subreg:DF (reg:V2DF 95) 8))

LRA has to reload the subreg because the subreg is not representable as
a full register. When LRA reloads this in
lra-constraints.c:simplyfy_operand_subreg (), it seems to reload
SUBREG_REG() and leave the byte offset alone.

i.e.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

The code in lra-constraints.c is this conditional:

   /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
      if there may be a problem accessing OPERAND in the outer
      mode.  */
   if ((REG_P (reg)
       ....
       insert_move_for_subreg (insert_before ? &before : NULL,
                   insert_after ? &after : NULL,
                   reg, new_reg);
     }
       ....

What happens subsequently is that LRA keeps looping over this RTL and
keeps reloading the SUBREG_REG() till the limit of constraint passes is
reached.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

I can't see any place where this subreg is resolved (eg. into equiv
memref) before the next iteration comes around for reloading the inputs
and outputs of curr_insn. Or am I missing something some part of code
that tries reloading the subreg with different alternatives or reg classes?


I guess this behaviour is wrong. We could spill the V2DF pseudo or put it into another class reg. But it is not implemented. This code is actually a modified version of reload pass one. We could implement alternative strategies and a check for potential loop (such code exists in process_alt_operands).

Could you send me the macro change and the test. I'll look at it and figure out what can we do.

Hi,

Thanks for looking at this.

The macro change is in this patch http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03638.html. The test is gcc.c-torture/compile/simd-3.c and when compiled with -O1 for aarch64, ICEs:

gcc/testsuite/gcc.c-torture/compile/simd-3.c:22:1: internal compiler error: Maximum number of LRA constraint passes is achieved (30)

Also, I'm curious to know - is it possible to vec_extract for vector mode subregs and zero/sign extract for scalars and spilling be the last resort if either of these are not possible? As you say, non-zero SUBREG_BYTE offset could also be resolved using a different regclass where the sub-mode could just be a full-register.

Thanks,
Tejas.

Reply via email to