Vladimir Makarov wrote:
On 12/5/2013, 9:35 AM, Tejas Belagod wrote:
Vladimir Makarov wrote:
On 12/4/2013, 6:15 AM, Tejas Belagod wrote:
Hi,

I'm trying to relax CANNOT_CHANGE_MODE_CLASS for aarch64 to allow all
mode changes on FP_REGS as aarch64 does not have register-packing, but
I'm running into an LRA ICE. A test case generates an RTL subreg of the
following form

        (set (reg:DF 97) (subreg:DF (reg:V2DF 95) 8))

LRA has to reload the subreg because the subreg is not representable as
a full register. When LRA reloads this in
lra-constraints.c:simplyfy_operand_subreg (), it seems to reload
SUBREG_REG() and leave the byte offset alone.

i.e.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

The code in lra-constraints.c is this conditional:

   /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
      if there may be a problem accessing OPERAND in the outer
      mode.  */
   if ((REG_P (reg)
       ....
       insert_move_for_subreg (insert_before ? &before : NULL,
                   insert_after ? &after : NULL,
                   reg, new_reg);
     }
       ....

What happens subsequently is that LRA keeps looping over this RTL and
keeps reloading the SUBREG_REG() till the limit of constraint passes is
reached.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

I can't see any place where this subreg is resolved (eg. into equiv
memref) before the next iteration comes around for reloading the inputs
and outputs of curr_insn. Or am I missing something some part of code
that tries reloading the subreg with different alternatives or reg
classes?

I guess this behaviour is wrong.  We could spill the V2DF pseudo or
put it into another class reg. But it is not implemented.  This code
is actually a modified version of reload pass one.  We could implement
alternative strategies and a check for potential loop (such code
exists in process_alt_operands).

Could you send me the macro change and the test.  I'll look at it and
figure out what can we do.
Hi,

Thanks for looking at this.

The macro change is in this patch
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03638.html. The test is
gcc.c-torture/compile/simd-3.c and when compiled with -O1 for aarch64,
ICEs:

gcc/testsuite/gcc.c-torture/compile/simd-3.c:22:1: internal compiler
error: Maximum number of LRA constraint passes is achieved (30)

Also, I'm curious to know - is it possible to vec_extract for vector
mode subregs and zero/sign extract for scalars and spilling be the last
resort if either of these are not possible? As you say, non-zero
SUBREG_BYTE offset could also be resolved using a different regclass
where the sub-mode could just be a full-register.


Here is the patch which solves the problem. Right now it is only spilling but it is the best what can be done for this case. I'll submit the patch on the next week after better testing on different platforms.


Hi Vladimir,

Have you had a chance to get this patch tested? This can fix a regression I'm seeing on AArch64, and I'd like to get it in if you think this patch is good to go.

Thanks,
Tejas.


Vec_extract is interesting but it is a rare case which needs a lot of code to implement this. I think we need more general approach called bitwidth-aware RA (putting several pseudo values into regs, e.g vec regs). Although I don't know will it help for arm64 cpus. Last time i checked manually bitwidth-aware RA for intel cpus, it makes code bigger and slower.

If there is a mainstream processor for which it can improve performance, i'd put it in my higher priority list to do.






Reply via email to