Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Richard Sandiford Thu, 18 Apr 2019 09:00:43 -0700

Martin Sebor <mse...@gmail.com> writes:
> On 4/18/19 6:07 AM, Richard Sandiford wrote:
>> "Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:
>>> On 18/04/2019 03:38, Martin Sebor wrote:
>>>> The fix for pr89797 committed in r270326 was limited to targets
>>>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>>>> The tests for the fix have been failing with an ICE on aarch64
>>>> because it suffers from more or less the same problem but in
>>>> its own target-specific code.  Attached is the patch I posted
>>>> yesterday that fixes the ICE, successfully bootstrapped and
>>>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>>>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>>>> There are no ICEs but there are tons of errors in the latter
>>>> tests because many (most?) either expect to be able to find
>>>> libc headers or link executables (I have not built libc for
>>>> aarch64).
>>>>
>>>> I'm around tomorrow but then traveling the next two weeks (with
>>>> no connectivity the first week) so I unfortunately won't be able
>>>> to fix whatever this change might break until the week of May 6.
>>>>
>>>> Jeff, if you have an aarch64 tester that could verify this patch
>>>> tomorrow that would help give us some confidence.  Otherwise,
>>>> another option to consider for the time being is to xfail
>>>> the tests on aarch64.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>> gcc-89797.diff
>>>>
>>>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>    PR middle-end/89797
>>>>    * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>>>>    NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>>>>    * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>>>>    assuming type size fits in SHWI.
>>>>
>>>> Index: gcc/tree.h
>>>> ===================================================================
>>>> --- gcc/tree.h     (revision 270418)
>>>> +++ gcc/tree.h     (working copy)
>>>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>>>     if (NUM_POLY_INT_COEFFS == 2)
>>>>       {
>>>>         poly_uint64 res = 0;
>>>> -      res.coeffs[0] = 1 << (precision & 0xff);
>>>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>>>
>>> I'm not sure I understand this either before or after.  (precision &
>>> 0xff) gives a value in the range 0-255, which is way more than can be
>>> accommodated in a left shift, even for a HWI.
>> 
>> For the record: the problem is that we have two coefficients that are
>> logically both in the range 1<<[0,63], needing 6 bits per coefficient
>> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
>> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
>> it in a single bit.
>> 
>> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
>> E.g. 0xff can be loaded directly from memory as a byte.  There's not
>> going to be much in it either way though.
>> 
>>>>         if (precision & 0x100)
>>>> -  res.coeffs[1] = 1 << (precision & 0xff);
>>>> +  res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>>>
>>> And this equally doesn't make sense >>16 will shift the masked value out
>>> of the bottom of the result.
>> 
>> Yeah, we should keep the original shift amount and just do
>> s/1/HOST_WIDE_INT_1U/.
>
> Sorry, I confused things with the 16 bit shift thinko (I meant
> to shift by 8 bits).  But I don't understand how poly_int works
> at all so the shift was just a wild guess thinking the purpose
> of the split was to allow for precisions in excess of (1 << 63).
>
> But I'm afraid I still don't understand when (precision & 0x100)
> could ever hold.  I put in an abort in that branch and ran
> the test suite but didn't manage to trigger an ICE.  Can you
> give me an example when it might trigger?  (I'd also like to
> add a comment to the function to explain it.)


The function is decoding the precision installed by:

inline void
SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
{
  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
  int index = exact_log2 (coeff0);
  gcc_assert (index >= 0);
  if (NUM_POLY_INT_COEFFS == 2)
    {
      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
      gcc_assert (coeff1 == 0 || coeff1 == coeff0);
      VECTOR_TYPE_CHECK (node)->type_common.precision
        = index + (coeff1 != 0 ? 0x100 : 0);
    }
  else
    VECTOR_TYPE_CHECK (node)->type_common.precision = index;
}

Code protected by NUM_POLY_INT_COEFFS == 2 will only trigger on
AArch64 targets, so an x86 test run won't cover it.  The assert you
added would blow up on pretty much all of aarch64-sve.exp in an AArch64
test run though; see Christophe's results for what happened with the
original patch.

I certainly don't mind adding more commentary, but please keep that
separate from the PR fix, which is simply about changing "1 <<" to
"HOST_WIDE_INT_1U <<".  I think we went down a bit of a rabbit hole
with this encoding question. :-)

FWIW, the layout of "precision" is fundamental to the handling of length-
agnostic vector types and is covered by all SVE autovectorisation tests,
so I'm pretty confident that we're reading "precision" correctly.  (But of
course, both arms have the bug you're fixing, in which we were shifting
an int instead of a UWHI.)

Thanks,
Richard

Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Reply via email to