On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
Hi,
I've been spending some (spare) time checking what it would take to
make LRA work for the avr target.
Right after I removed the TARGET_LRA_P hook disabling LRA, building
libgcc failed with a weird ICE.
On the avr, the stack pointer (SP)
is not used to access stack slots
It is very uncommon target then.
- TARGET_CAN_ELIMINATE returns false
if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
if get_frame_size() > 0.
With LRA, however, reload generates
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
(nil))
and the backend code errors out when it finds SP is being used as a
pointer register.
Digging through the RTL dumps, I found the following. For the
following insn sequence in *.ira
(insn 189 128 159 7 (set (reg:HI 58 [ b ])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
(nil))
(insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
(reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
(reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
(nil))
1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
Popping a13(r58,l0) -- assign reg 28
2. LRA sees the subreg in insn 159 and generates a reload reg
(r125). simplify_subreg_regno (lra-constraints.cc:1810) however
bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
reload isn't completed yet. LRA therefore decides rclass for the
pseudo reg is NO_REGS.
<snip>
Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
159: r125:HI#0=r86:QI
4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
That is my understanding, please correct me if I'm wrong.
<snip>
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
Cycle danger: overall += LRA_MAX_REJECT
alt=0,overall=610,losers=1,rld_nregs=1
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
alt=1: Bad operand -- refuse
0 Non pseudo reload: reject++
alt=2,overall=1,losers=0,rld_nregs=0
Choosing alt 2 in insn 159: (0) Qm (1) rY00 {movqi_insn_split}
5. LRA creates stack slots, and then uses the FP register to access
the slots. This is despite r58 already being assigned R28:R29.
6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
frame_pointer_needed is not set, despite the creation of stack
slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
and this eventually causes the ICE when the avr backend sees SP being
used as a pointer register.
This is the relevant sequence after reload
<snip>
(insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
(nil))
(insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
(nil))
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101
{*movhi_split}
(nil))
(insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
(nil))
(insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 2 [0x2])) [2 %sfp+2 S1 A8])
(reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101
{*movhi_split}
(nil))
For choices other than FP, simplify_subreg_regno returns the correct part
of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.
I checked what classic reload does in the same situation - it picks a
different register (R25) instead of spilling to a stack slot.
<snip>
(insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
(nil))
(insn 159 189 226 7 (set (reg:QI 25 r25)
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 226 159 160 7 (set (reg:QI 28 r28)
(reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 160 226 227 7 (set (reg:QI 25 r25)
(reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
(nil))
(insn 227 160 33 7 (set (reg:QI 29 r29)
(reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
(nil))
My questions:
1. Is there something obvious the avr backend is doing wrong that is
causing this?
No. In my opinion if it worked with reload, it should work with LRA
because LRA is a substitute for reload.
2. Shouldn't LRA ask the backend for frame_pointer_required_p and
update frame_pointer_needed if it creates stack slots?
I think so.
3. Even if (2) works, I see that lra-eliminates.cc:update_reg_eliminate
asserts that if the backend said elimination to SP is
ok first up, it
cannot reject that elimination later (line 1165). If the only reason
FP is required is because LRA created stack
slots, what should the backend do?
I think it can be relaxed for avr on which sp can not be used access
stack memory
4. When simplify_subreg_regno bails for FP, lra-constraints.cc:1815
sets rclass = NO_REGS and forces a spill to memory. The comment says
it is to prevent infinite looping, but for this case, doesn't it
make sense to look for other regs?
I guess it can be relaxed for avr and frame-pointer too as avr does not
use sp for accessing memory
5. I can work around the problem by disabling elimination from FP to SP
when lra_in_progress, but I think it pevents IRA/LRA from using
R28:R29 even when FP is not required at all?
Yes, it is better to avoid disabling elimination
6. Basic question, but does FP to SP elimination mean any operation
possible with FP should be doable with SP as well?
Hard to say. There are a lot of undocumented RA assumptions.
If you send me the preprocessed test, I could start to work on it to fix
the problems. I think it is hard to fix them right for a person having
a little experience with LRA.