Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Martin Sebor Thu, 18 Apr 2019 12:31:21 -0700

On 4/18/19 9:59 AM, Richard Sandiford wrote:

Martin Sebor <mse...@gmail.com> writes:

On 4/18/19 6:07 AM, Richard Sandiford wrote:

"Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:

On 18/04/2019 03:38, Martin Sebor wrote:

The fix for pr89797 committed in r270326 was limited to targets
with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
The tests for the fix have been failing with an ICE on aarch64
because it suffers from more or less the same problem but in
its own target-specific code.  Attached is the patch I posted
yesterday that fixes the ICE, successfully bootstrapped and
regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
aarch64.exp tests with an aarch64-linux-elf cross-compiler.
There are no ICEs but there are tons of errors in the latter
tests because many (most?) either expect to be able to find
libc headers or link executables (I have not built libc for
aarch64).


I'm around tomorrow but then traveling the next two weeks (with
no connectivity the first week) so I unfortunately won't be able
to fix whatever this change might break until the week of May 6.

Jeff, if you have an aarch64 tester that could verify this patch
tomorrow that would help give us some confidence.  Otherwise,
another option to consider for the time being is to xfail
the tests on aarch64.

Thanks
Martin

gcc-89797.diff

PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable

gcc/ChangeLog:

        PR middle-end/89797
        * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
        NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
        * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
        assuming type size fits in SHWI.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h  (revision 270418)
+++ gcc/tree.h  (working copy)
@@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
     if (NUM_POLY_INT_COEFFS == 2)
       {
         poly_uint64 res = 0;
-      res.coeffs[0] = 1 << (precision & 0xff);
+      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);


I'm not sure I understand this either before or after.  (precision &
0xff) gives a value in the range 0-255, which is way more than can be
accommodated in a left shift, even for a HWI.


For the record: the problem is that we have two coefficients that are
logically both in the range 1<<[0,63], needing 6 bits per coefficient
and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
it in a single bit.

A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
E.g. 0xff can be loaded directly from memory as a byte.  There's not
going to be much in it either way though.

         if (precision & 0x100)
-       res.coeffs[1] = 1 << (precision & 0xff);
+       res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);


And this equally doesn't make sense >>16 will shift the masked value out
of the bottom of the result.


Yeah, we should keep the original shift amount and just do
s/1/HOST_WIDE_INT_1U/.


Sorry, I confused things with the 16 bit shift thinko (I meant
to shift by 8 bits).  But I don't understand how poly_int works
at all so the shift was just a wild guess thinking the purpose
of the split was to allow for precisions in excess of (1 << 63).

But I'm afraid I still don't understand when (precision & 0x100)
could ever hold.  I put in an abort in that branch and ran
the test suite but didn't manage to trigger an ICE.  Can you
give me an example when it might trigger?  (I'd also like to
add a comment to the function to explain it.)


The function is decoding the precision installed by:

inline void
SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
{
   STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
   unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
   int index = exact_log2 (coeff0);
   gcc_assert (index >= 0);
   if (NUM_POLY_INT_COEFFS == 2)
     {
       unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
       gcc_assert (coeff1 == 0 || coeff1 == coeff0);
       VECTOR_TYPE_CHECK (node)->type_common.precision
        = index + (coeff1 != 0 ? 0x100 : 0);


Ugh.  It never occured to me that precision was being used like
this.  Even though the functions are one right after the other,
a brief comment would have helped avoid this misunderstanding.
Let me propose one separately.

     }
   else
     VECTOR_TYPE_CHECK (node)->type_common.precision = index;
}

Code protected by NUM_POLY_INT_COEFFS == 2 will only trigger on
AArch64 targets, so an x86 test run won't cover it.  The assert you
added would blow up on pretty much all of aarch64-sve.exp in an AArch64
test run though; see Christophe's results for what happened with the
original patch.


Ah!  aarch64-sve.exp explains it.  I only ran aarch64.exp.  There,
there's just one test that triggers an ICE (I missed it at first):
  aarch64/stack-check-prologue-16.c
But many of the aarch64-sve.exp do fail with the abort there.


I certainly don't mind adding more commentary, but please keep that
separate from the PR fix, which is simply about changing "1 <<" to
"HOST_WIDE_INT_1U <<".  I think we went down a bit of a rabbit hole
with this encoding question. :-)


Sorry, I was just baffled by the code and trying to understand
how it works.

I will commit the attached revision shortly.

Thanks for your help!

Martin


FWIW, the layout of "precision" is fundamental to the handling of length-
agnostic vector types and is covered by all SVE autovectorisation tests,
so I'm pretty confident that we're reading "precision" correctly.  (But of
course, both arms have the bug you're fixing, in which we were shifting
an int instead of a UWHI.)

Thanks,
Richard

PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable

gcc/ChangeLog:

	PR middle-end/89797
	* tree.h (TYPE_VECTOR_SUBPARTS): Use HOST_WIDE_INT_1U.
	* config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
	assuming type size fits in SHWI.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 270418)
+++ gcc/tree.h	(working copy)
@@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
   if (NUM_POLY_INT_COEFFS == 2)
     {
       poly_uint64 res = 0;
-      res.coeffs[0] = 1 << (precision & 0xff);
+      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
       if (precision & 0x100)
-	res.coeffs[1] = 1 << (precision & 0xff);
+	res.coeffs[1] = HOST_WIDE_INT_1U << (precision & 0xff);
       return res;
     }
   else
-    return (unsigned HOST_WIDE_INT)1 << precision;
+    return HOST_WIDE_INT_1U << precision;
 }
 
 /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 270418)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -14924,8 +14924,7 @@ aarch64_simd_vector_alignment (const_tree type)
        be set for non-predicate vectors of booleans.  Modes are the most
        direct way we have of identifying real SVE predicate types.  */
     return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
-  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
-  return MIN (align, 128);
+  return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
 }
 
 /* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT.  */

Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Reply via email to