Vladimir Makarov wrote:
On 12/4/2013, 6:15 AM, Tejas Belagod wrote:
Hi,
I'm trying to relax CANNOT_CHANGE_MODE_CLASS for aarch64 to allow all
mode changes on FP_REGS as aarch64 does not have register-packing, but
I'm running into an LRA ICE. A test case generates an RTL subreg of the
following form
(set (reg:DF 97) (subreg:DF (reg:V2DF 95) 8))
LRA has to reload the subreg because the subreg is not representable as
a full register. When LRA reloads this in
lra-constraints.c:simplyfy_operand_subreg (), it seems to reload
SUBREG_REG() and leave the byte offset alone.
i.e.
(set (reg:V2DF 100) (reg:V2DF 95))
(set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))
The code in lra-constraints.c is this conditional:
/* Force a reload of the SUBREG_REG if this is a constant or PLUS or
if there may be a problem accessing OPERAND in the outer
mode. */
if ((REG_P (reg)
....
insert_move_for_subreg (insert_before ? &before : NULL,
insert_after ? &after : NULL,
reg, new_reg);
}
....
What happens subsequently is that LRA keeps looping over this RTL and
keeps reloading the SUBREG_REG() till the limit of constraint passes is
reached.
(set (reg:V2DF 100) (reg:V2DF 95))
(set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))
I can't see any place where this subreg is resolved (eg. into equiv
memref) before the next iteration comes around for reloading the inputs
and outputs of curr_insn. Or am I missing something some part of code
that tries reloading the subreg with different alternatives or reg classes?
I guess this behaviour is wrong. We could spill the V2DF pseudo or put
it into another class reg. But it is not implemented. This code is
actually a modified version of reload pass one. We could implement
alternative strategies and a check for potential loop (such code exists
in process_alt_operands).
Could you send me the macro change and the test. I'll look at it and
figure out what can we do.
Hi,
Thanks for looking at this.
The macro change is in this patch
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03638.html. The test is
gcc.c-torture/compile/simd-3.c and when compiled with -O1 for aarch64, ICEs:
gcc/testsuite/gcc.c-torture/compile/simd-3.c:22:1: internal compiler error:
Maximum number of LRA constraint passes is achieved (30)
Also, I'm curious to know - is it possible to vec_extract for vector mode
subregs and zero/sign extract for scalars and spilling be the last resort if
either of these are not possible? As you say, non-zero SUBREG_BYTE offset could
also be resolved using a different regclass where the sub-mode could just be a
full-register.
Thanks,
Tejas.