Re: Tighten LRA test for reloading the inner reg of a paradoxical subreg

Jeff Law Mon, 11 Jun 2018 15:56:01 -0700
On 05/30/2018 06:31 AM, Richard Sandiford wrote:
> Kyrill  Tkachov <kyrylo.tkac...@foss.arm.com> writes:
>> Hi Richard,
>>
>> On 29/05/18 15:26, Richard Sandiford wrote:
>>> Kyrill  Tkachov <kyrylo.tkac...@foss.arm.com> writes:
>>>> Hi all,
>>>>
>>>> The recent changes to aarch64_expand_vector_init cause an ICE in the
>>>> attached testcase.  The register allocator "ICEs with Max. number of
>>>> generated reload insns per insn is achieved (90)"
>>>>
>>>> That is because aarch64_expand_vector_init creates a paradoxical
>>>> subreg to move a DImode value
>>>> into a V2DI vector:
>>>> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ])
>>>>           (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1050
>>>> {*aarch64_simd_movv2di}
>>>>
>>>> This is done because we want to express that the whole of the V2DI
>>>> vector will be written so that init-regs doesn't try to
>>>> zero-initialise it before we overwrite each lane individually anyway.
>>>>
>>>> This can go bad for because if the DImode value is allocated in, say,
>>>> x30: the last register in that register class, the V2DI subreg of that
>>>> isn't valid or meaningful and that seems to cause the trouble.
>>>>
>>>> It's kinda hard to know what the right solution for this is.
>>>> We could emit a duplicate of the value into all the lanes of the
>>>> vector, but we have tests that test against that
>>>> (we're trying to avoid unnecessary duplicates)
>>>>
>>>> What this patch does is it defines a pattern for moving a scalar into
>>>> lane 0 of a vector using a simple FMOV or LDR and represents that as a
>>>> merging with a vector of zeroes.  That way, the instruction represents
>>>> a full write of the destination vector but doesn't "read" more bits
>>>> from the source than necessary. The zeroing effect is also a more
>>>> accurate representation of the semantics of FMOV.
>>> This feels like a hack.  Either the paradoxical subreg of the pseudo
>>> is invalid for some reason (in which case we should ICE when it's formed,
>>> not just in the case of x30 being allocated) or the subreg is valid,
>>> in which case the RA should handle it correctly (and the backend should
>>> give it the information it needs to do that).
>>>
>>> I could see the argument for ignoring the problem for expediency if the
>>> patch was a clean-up in its own right, but I think it's wrong to add so
>>> much code to paper over a bug.
>>
>> I see what you mean. Do you have any thoughts on where in RA we'd go
>> about fixing this?  Since I don't know my way around RA I tried in the
>> place I was most comfortable changing :)
> 
> Ah, but everyone who's patched the RA had to patch it for the
> first time. :-)  And it's not that scary.  But anyway...
> 
> The original insn was:
> 
> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ])
>         (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1060 {*aarch64_simd_movv2di}
>      (nil))
> 
> which IRA converted to:
> 
> (insn 74 72 580 8 (set (reg:V2DI 287 [ _166 ])
>         (subreg:V2DI (reg/v/f:DI 517 [orig:112 d ] [112]) 0)) 1060 
> {*aarch64_simd_movv2di}
>      (nil))
> 
> after creating loop allocnos.  It happens that the ALLOCNO_WMODEs for
> both 112 and 517 were not set to V2DI due to another bug that I'll post
> a separate patch for, but we nevertheless got a valid allocation of
> register 1.
> 
> LRA's first try at constraining the instruction gave:
> 
>          Choosing alt 5 in insn 74:  (0) ?w  (1) r {*aarch64_simd_movv2di}
> 
> at which point all was good.  But LRA later decided it needed
> to spill r517:
> 
>     Spill r517 after risky transformations
> 
> so the next constraint attempt gave:
> 
>          Choosing alt 0 in insn 74:  (0) =w  (1) m {*aarch64_simd_movv2di}
> 
> which was still good.  Then during inheritance we had:
> 
>       Creating newreg=672 from oldreg=517, assigning class GENERAL_REGS to 
> inheritance r672
>     Original reg change 517->672 (bb8):
>    74: r287:V2DI=r672:DI#0
>     Add inheritance<-original before:
>   939: r672:DI=r517:DI
> 
>     Inheritance reuse change 517->672 (bb8):
>   620: r572:DI=r672:DI
>       REG_DEAD r672:DI
> 
>     Use smallest class of POINTER_REGS and GENERAL_REGS
>       Creating newreg=673 from oldreg=517, assigning class POINTER_REGS to 
> inheritance r673
>     Original reg change 517->673 (bb8):
>   936: r669:DI=r673:DI
>     Add inheritance<-original before:
>   940: r673:DI=r517:DI
> 
> ("Use smallest class of POINTER_REGS and GENERAL_REGS" ought to
> give GENERAL_REGS.  That might be a missed optimisation, and probably
> due to both classes having the same number of allocatable registers.
> I'll look at that as a follow-on.)
> 
> Thus LRA created two inheritance registers for r517, one (r673)
> that included the unallocatable x31 and another (r672) that didn't.
> The r672 references included the paradoxical subreg in insn 74 but the
> r673 ones didn't.  LRA then allocated x30 to r673, which was a valid
> choice.
> 
> Later LRA decided to "undo" the inheritance for insn 620, but because
> of the double inheritance, it got confused as to what the original
> situation was, and made insn 74 use the other inheritance register
> instead of r517:
> 
> ********** Undoing inheritance #2: **********
> 
> Inherit 11 out of 12 (91.67%)
>    Insn after restoring regs:
>   620: r572:DI=r517:DI
>       REG_DEAD r517:DI
>     Change reload insn:
>    74: r287:V2DI=r673:DI#0       <-------------------
>    Insn after restoring regs:
>   939: r517:DI=r673:DI
>       REG_DEAD r673:DI
> 
> This might be a bug in itself: we should probably look through sets
> of other inheritance pseudos to find the "real" origin.
> 
> Either way, at this point we had a situation in which r673 was used in an
> insn whose subreg was larger than the biggest_mode that r673 had when it
> was allocated.  While x30 was valid for the original biggest_mode, it
> wasn't valid for this subreg use.
> 
> The next attempt to constrain insn 74 was:
> 
>         Choosing alt 5 in insn 74:  (0) ?w  (1) r {*aarch64_simd_movv2di}
>       Creating newreg=684, assigning class GENERAL_REGS to r684
>    74: r287:V2DI=r684:V2DI
>     Inserting insn reload before:
>   951: r684:V2DI=r673:DI#0
> 
> where LRA reloaded the SUBREG rather than the SUBREG_REG.  And it
> then cycled trying the same thing when reloading the reload (and the
> reload of the reload, etc.).
> 
> What it should be doing here is reloading the SUBREG_REG instead.
> There's already code to cope with this case when the paradoxical
> subreg falls outside the class (which isn't true here, since r673
> is POINTER_REGS and POINTER_REGS includes x31).  But I think we
> should also test whether LRA is entitled to allocate the spanned
> registers.  Not doing that seems like a bug regardless of the above
> missed optimisation and the mix-up undoing inheritance.
> 
> Tested so far on aarch64-linux-gnu.  I'll try aarch64_be-elf
> and x86_64-linux-gnu too.
> 
> Thanks,
> Richard
> 
> 
> 2018-05-30  Richard Sandiford  <richard.sandif...@linaro.org>
> 
> gcc/
>       * lra-constraints.c (simplify_operand_subreg): In the paradoxical
>       case, check whether the outer register overlaps an unallocatable
>       register, not just whether it fits the required class.
> 
> gcc/testsuite/
>       * g++.dg/torture/aarch64-vect-init-1.C: New test.
OK.
jeff
Re: Tighten LRA test for reloading the inner reg of a paradoxical subreg

Reply via email to