On 05/30/2018 06:31 AM, Richard Sandiford wrote:
> Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> writes:
>> Hi Richard,
>>
>> On 29/05/18 15:26, Richard Sandiford wrote:
>>> Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> writes:
>>>> Hi all,
>>>>
>>>> The recent changes to aarch64_expand_vector_init cause an ICE in the
>>>> attached testcase. The register allocator "ICEs with Max. number of
>>>> generated reload insns per insn is achieved (90)"
>>>>
>>>> That is because aarch64_expand_vector_init creates a paradoxical
>>>> subreg to move a DImode value
>>>> into a V2DI vector:
>>>> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ])
>>>> (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1050
>>>> {*aarch64_simd_movv2di}
>>>>
>>>> This is done because we want to express that the whole of the V2DI
>>>> vector will be written so that init-regs doesn't try to
>>>> zero-initialise it before we overwrite each lane individually anyway.
>>>>
>>>> This can go bad for because if the DImode value is allocated in, say,
>>>> x30: the last register in that register class, the V2DI subreg of that
>>>> isn't valid or meaningful and that seems to cause the trouble.
>>>>
>>>> It's kinda hard to know what the right solution for this is.
>>>> We could emit a duplicate of the value into all the lanes of the
>>>> vector, but we have tests that test against that
>>>> (we're trying to avoid unnecessary duplicates)
>>>>
>>>> What this patch does is it defines a pattern for moving a scalar into
>>>> lane 0 of a vector using a simple FMOV or LDR and represents that as a
>>>> merging with a vector of zeroes. That way, the instruction represents
>>>> a full write of the destination vector but doesn't "read" more bits
>>>> from the source than necessary. The zeroing effect is also a more
>>>> accurate representation of the semantics of FMOV.
>>> This feels like a hack. Either the paradoxical subreg of the pseudo
>>> is invalid for some reason (in which case we should ICE when it's formed,
>>> not just in the case of x30 being allocated) or the subreg is valid,
>>> in which case the RA should handle it correctly (and the backend should
>>> give it the information it needs to do that).
>>>
>>> I could see the argument for ignoring the problem for expediency if the
>>> patch was a clean-up in its own right, but I think it's wrong to add so
>>> much code to paper over a bug.
>>
>> I see what you mean. Do you have any thoughts on where in RA we'd go
>> about fixing this? Since I don't know my way around RA I tried in the
>> place I was most comfortable changing :)
>
> Ah, but everyone who's patched the RA had to patch it for the
> first time. :-) And it's not that scary. But anyway...
>
> The original insn was:
>
> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ])
> (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1060 {*aarch64_simd_movv2di}
> (nil))
>
> which IRA converted to:
>
> (insn 74 72 580 8 (set (reg:V2DI 287 [ _166 ])
> (subreg:V2DI (reg/v/f:DI 517 [orig:112 d ] [112]) 0)) 1060
> {*aarch64_simd_movv2di}
> (nil))
>
> after creating loop allocnos. It happens that the ALLOCNO_WMODEs for
> both 112 and 517 were not set to V2DI due to another bug that I'll post
> a separate patch for, but we nevertheless got a valid allocation of
> register 1.
>
> LRA's first try at constraining the instruction gave:
>
> Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di}
>
> at which point all was good. But LRA later decided it needed
> to spill r517:
>
> Spill r517 after risky transformations
>
> so the next constraint attempt gave:
>
> Choosing alt 0 in insn 74: (0) =w (1) m {*aarch64_simd_movv2di}
>
> which was still good. Then during inheritance we had:
>
> Creating newreg=672 from oldreg=517, assigning class GENERAL_REGS to
> inheritance r672
> Original reg change 517->672 (bb8):
> 74: r287:V2DI=r672:DI#0
> Add inheritance<-original before:
> 939: r672:DI=r517:DI
>
> Inheritance reuse change 517->672 (bb8):
> 620: r572:DI=r672:DI
> REG_DEAD r672:DI
>
> Use smallest class of POINTER_REGS and GENERAL_REGS
> Creating newreg=673 from oldreg=517, assigning class POINTER_REGS to
> inheritance r673
> Original reg change 517->673 (bb8):
> 936: r669:DI=r673:DI
> Add inheritance<-original before:
> 940: r673:DI=r517:DI
>
> ("Use smallest class of POINTER_REGS and GENERAL_REGS" ought to
> give GENERAL_REGS. That might be a missed optimisation, and probably
> due to both classes having the same number of allocatable registers.
> I'll look at that as a follow-on.)
>
> Thus LRA created two inheritance registers for r517, one (r673)
> that included the unallocatable x31 and another (r672) that didn't.
> The r672 references included the paradoxical subreg in insn 74 but the
> r673 ones didn't. LRA then allocated x30 to r673, which was a valid
> choice.
>
> Later LRA decided to "undo" the inheritance for insn 620, but because
> of the double inheritance, it got confused as to what the original
> situation was, and made insn 74 use the other inheritance register
> instead of r517:
>
> ********** Undoing inheritance #2: **********
>
> Inherit 11 out of 12 (91.67%)
> Insn after restoring regs:
> 620: r572:DI=r517:DI
> REG_DEAD r517:DI
> Change reload insn:
> 74: r287:V2DI=r673:DI#0 <-------------------
> Insn after restoring regs:
> 939: r517:DI=r673:DI
> REG_DEAD r673:DI
>
> This might be a bug in itself: we should probably look through sets
> of other inheritance pseudos to find the "real" origin.
>
> Either way, at this point we had a situation in which r673 was used in an
> insn whose subreg was larger than the biggest_mode that r673 had when it
> was allocated. While x30 was valid for the original biggest_mode, it
> wasn't valid for this subreg use.
>
> The next attempt to constrain insn 74 was:
>
> Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di}
> Creating newreg=684, assigning class GENERAL_REGS to r684
> 74: r287:V2DI=r684:V2DI
> Inserting insn reload before:
> 951: r684:V2DI=r673:DI#0
>
> where LRA reloaded the SUBREG rather than the SUBREG_REG. And it
> then cycled trying the same thing when reloading the reload (and the
> reload of the reload, etc.).
>
> What it should be doing here is reloading the SUBREG_REG instead.
> There's already code to cope with this case when the paradoxical
> subreg falls outside the class (which isn't true here, since r673
> is POINTER_REGS and POINTER_REGS includes x31). But I think we
> should also test whether LRA is entitled to allocate the spanned
> registers. Not doing that seems like a bug regardless of the above
> missed optimisation and the mix-up undoing inheritance.
>
> Tested so far on aarch64-linux-gnu. I'll try aarch64_be-elf
> and x86_64-linux-gnu too.
>
> Thanks,
> Richard
>
>
> 2018-05-30 Richard Sandiford <richard.sandif...@linaro.org>
>
> gcc/
> * lra-constraints.c (simplify_operand_subreg): In the paradoxical
> case, check whether the outer register overlaps an unallocatable
> register, not just whether it fits the required class.
>
> gcc/testsuite/
> * g++.dg/torture/aarch64-vect-init-1.C: New test.
OK.
jeff