Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Martin Sebor Thu, 18 Apr 2019 08:06:31 -0700

On 4/18/19 6:07 AM, Richard Sandiford wrote:

"Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:

On 18/04/2019 03:38, Martin Sebor wrote:

The fix for pr89797 committed in r270326 was limited to targets
with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
The tests for the fix have been failing with an ICE on aarch64
because it suffers from more or less the same problem but in
its own target-specific code.  Attached is the patch I posted
yesterday that fixes the ICE, successfully bootstrapped and
regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
aarch64.exp tests with an aarch64-linux-elf cross-compiler.
There are no ICEs but there are tons of errors in the latter
tests because many (most?) either expect to be able to find
libc headers or link executables (I have not built libc for
aarch64).


I'm around tomorrow but then traveling the next two weeks (with
no connectivity the first week) so I unfortunately won't be able
to fix whatever this change might break until the week of May 6.

Jeff, if you have an aarch64 tester that could verify this patch
tomorrow that would help give us some confidence.  Otherwise,
another option to consider for the time being is to xfail
the tests on aarch64.

Thanks
Martin

gcc-89797.diff

PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable

gcc/ChangeLog:

        PR middle-end/89797
        * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
        NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
        * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
        assuming type size fits in SHWI.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h  (revision 270418)
+++ gcc/tree.h  (working copy)
@@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
    if (NUM_POLY_INT_COEFFS == 2)
      {
        poly_uint64 res = 0;
-      res.coeffs[0] = 1 << (precision & 0xff);
+      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);


I'm not sure I understand this either before or after.  (precision &
0xff) gives a value in the range 0-255, which is way more than can be
accommodated in a left shift, even for a HWI.


For the record: the problem is that we have two coefficients that are
logically both in the range 1<<[0,63], needing 6 bits per coefficient
and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
it in a single bit.

A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
E.g. 0xff can be loaded directly from memory as a byte.  There's not
going to be much in it either way though.

        if (precision & 0x100)
-       res.coeffs[1] = 1 << (precision & 0xff);
+       res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);


And this equally doesn't make sense >>16 will shift the masked value out
of the bottom of the result.


Yeah, we should keep the original shift amount and just do
s/1/HOST_WIDE_INT_1U/.


Sorry, I confused things with the 16 bit shift thinko (I meant
to shift by 8 bits).  But I don't understand how poly_int works
at all so the shift was just a wild guess thinking the purpose
of the split was to allow for precisions in excess of (1 << 63).

But I'm afraid I still don't understand when (precision & 0x100)
could ever hold.  I put in an abort in that branch and ran
the test suite but didn't manage to trigger an ICE.  Can you
give me an example when it might trigger?  (I'd also like to
add a comment to the function to explain it.)

The test case I'm fixing is:

 attribute ((vector_size (1LLU << 62))) char v;

Greater values are rejected with:

error: ‘vector_size’ attribute argument value ‘9223372036854775808’exceeds 9223372036854775807

Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?

R.

        return res;
      }
    else
-    return (unsigned HOST_WIDE_INT)1 << precision;
+    return HOST_WIDE_INT_1U << precision;
  }

/* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c        (revision 270418)
+++ gcc/config/aarch64/aarch64.c        (working copy)
@@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
         be set for non-predicate vectors of booleans.  Modes are the most
         direct way we have of identifying real SVE predicate types.  */
      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
-  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
+  tree size = TYPE_SIZE (type);
+  unsigned HOST_WIDE_INT align = 128;
+  if (tree_fits_uhwi_p (size))
+    align = tree_to_uhwi (TYPE_SIZE (type));
    return MIN (align, 128);


I guess this is personal preference, sorry, but:

   return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();

seems more direct and obvious than:

   tree size = TYPE_SIZE (type);
   unsigned HOST_WIDE_INT align = 128;
   if (tree_fits_uhwi_p (size))
     align = tree_to_uhwi (TYPE_SIZE (type));
   return MIN (align, 128);

OK with those changes, thanks.  I've bootrapped & regression-tested
it in that form on aarch64-linux-gnu.


Okay.  I will hold off on committing until I hear back from you
on the question above.

Thanks
Martin

Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Reply via email to