"Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:
> On 18/04/2019 03:38, Martin Sebor wrote:
>> The fix for pr89797 committed in r270326 was limited to targets
>> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
>> The tests for the fix have been failing with an ICE on aarch64
>> because it suffers from more or less the same problem but in
>> its own target-specific code.  Attached is the patch I posted
>> yesterday that fixes the ICE, successfully bootstrapped and
>> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
>> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
>> There are no ICEs but there are tons of errors in the latter
>> tests because many (most?) either expect to be able to find
>> libc headers or link executables (I have not built libc for
>> aarch64).
>> 
>> I'm around tomorrow but then traveling the next two weeks (with
>> no connectivity the first week) so I unfortunately won't be able
>> to fix whatever this change might break until the week of May 6.
>> 
>> Jeff, if you have an aarch64 tester that could verify this patch
>> tomorrow that would help give us some confidence.  Otherwise,
>> another option to consider for the time being is to xfail
>> the tests on aarch64.
>> 
>> Thanks
>> Martin
>> 
>> gcc-89797.diff
>> 
>> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
>> 
>> gcc/ChangeLog:
>> 
>>      PR middle-end/89797
>>      * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
>>      NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
>>      * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
>>      assuming type size fits in SHWI.
>> 
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h       (revision 270418)
>> +++ gcc/tree.h       (working copy)
>> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
>>    if (NUM_POLY_INT_COEFFS == 2)
>>      {
>>        poly_uint64 res = 0;
>> -      res.coeffs[0] = 1 << (precision & 0xff);
>> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
>
> I'm not sure I understand this either before or after.  (precision &
> 0xff) gives a value in the range 0-255, which is way more than can be
> accommodated in a left shift, even for a HWI.

For the record: the problem is that we have two coefficients that are
logically both in the range 1<<[0,63], needing 6 bits per coefficient
and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
it in a single bit.

A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
E.g. 0xff can be loaded directly from memory as a byte.  There's not
going to be much in it either way though.

>>        if (precision & 0x100)
>> -    res.coeffs[1] = 1 << (precision & 0xff);
>> +    res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
>
> And this equally doesn't make sense >>16 will shift the masked value out
> of the bottom of the result.

Yeah, we should keep the original shift amount and just do
s/1/HOST_WIDE_INT_1U/.

> Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?
>
> R.
>
>>        return res;
>>      }
>>    else
>> -    return (unsigned HOST_WIDE_INT)1 << precision;
>> +    return HOST_WIDE_INT_1U << precision;
>>  }
>>  
>>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c     (revision 270418)
>> +++ gcc/config/aarch64/aarch64.c     (working copy)
>> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
>>         be set for non-predicate vectors of booleans.  Modes are the most
>>         direct way we have of identifying real SVE predicate types.  */
>>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
>> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
>> +  tree size = TYPE_SIZE (type);
>> +  unsigned HOST_WIDE_INT align = 128;
>> +  if (tree_fits_uhwi_p (size))
>> +    align = tree_to_uhwi (TYPE_SIZE (type));
>>    return MIN (align, 128);

I guess this is personal preference, sorry, but:

  return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();

seems more direct and obvious than:

  tree size = TYPE_SIZE (type);
  unsigned HOST_WIDE_INT align = 128;
  if (tree_fits_uhwi_p (size))
    align = tree_to_uhwi (TYPE_SIZE (type));
  return MIN (align, 128);

OK with those changes, thanks.  I've bootrapped & regression-tested
it in that form on aarch64-linux-gnu.

Richard

Reply via email to