Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Christophe Lyon Thu, 18 Apr 2019 06:34:10 -0700

On Thu, 18 Apr 2019 at 14:08, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> "Richard Earnshaw (lists)" <richard.earns...@arm.com> writes:
> > On 18/04/2019 03:38, Martin Sebor wrote:
> >> The fix for pr89797 committed in r270326 was limited to targets
> >> with NUM_POLY_INT_COEFFS == 1 which I think is all but aarch64.
> >> The tests for the fix have been failing with an ICE on aarch64
> >> because it suffers from more or less the same problem but in
> >> its own target-specific code.  Attached is the patch I posted
> >> yesterday that fixes the ICE, successfully bootstrapped and
> >> regtested on x86_64-linux.  I also ran the dg.exp=*attr* and
> >> aarch64.exp tests with an aarch64-linux-elf cross-compiler.
> >> There are no ICEs but there are tons of errors in the latter
> >> tests because many (most?) either expect to be able to find
> >> libc headers or link executables (I have not built libc for
> >> aarch64).
> >>
> >> I'm around tomorrow but then traveling the next two weeks (with
> >> no connectivity the first week) so I unfortunately won't be able
> >> to fix whatever this change might break until the week of May 6.
> >>
> >> Jeff, if you have an aarch64 tester that could verify this patch
> >> tomorrow that would help give us some confidence.  Otherwise,
> >> another option to consider for the time being is to xfail
> >> the tests on aarch64.
> >>
> >> Thanks
> >> Martin
> >>
> >> gcc-89797.diff
> >>
> >> PR middle-end/89797 - ICE on a vector_size (1LU << 33) int variable
> >>
> >> gcc/ChangeLog:
> >>
> >>      PR middle-end/89797
> >>      * tree.h (TYPE_VECTOR_SUBPARTS): Correct computation when
> >>      NUM_POLY_INT_COEFFS == 2.  Use HOST_WIDE_INT_1U.
> >>      * config/aarch64/aarch64.c (aarch64_simd_vector_alignment): Avoid
> >>      assuming type size fits in SHWI.
> >>
> >> Index: gcc/tree.h
> >> ===================================================================
> >> --- gcc/tree.h       (revision 270418)
> >> +++ gcc/tree.h       (working copy)
> >> @@ -3735,13 +3735,13 @@ TYPE_VECTOR_SUBPARTS (const_tree node)
> >>    if (NUM_POLY_INT_COEFFS == 2)
> >>      {
> >>        poly_uint64 res = 0;
> >> -      res.coeffs[0] = 1 << (precision & 0xff);
> >> +      res.coeffs[0] = HOST_WIDE_INT_1U << (precision & 0xff);
> >
> > I'm not sure I understand this either before or after.  (precision &
> > 0xff) gives a value in the range 0-255, which is way more than can be
> > accommodated in a left shift, even for a HWI.
>
> For the record: the problem is that we have two coefficients that are
> logically both in the range 1<<[0,63], needing 6 bits per coefficient
> and 12 in total.  But precision is only 10 bits. :-)  Fortunately,
> coeffs[1] is in practice either 0 or equal to coeffs[0], so we can encode
> it in a single bit.
>
> A 0x100/0xff split seemed easier to optimise than a 0x40/0x3f split.
> E.g. 0xff can be loaded directly from memory as a byte.  There's not
> going to be much in it either way though.
>
> >>        if (precision & 0x100)
> >> -    res.coeffs[1] = 1 << (precision & 0xff);
> >> +    res.coeffs[1] = HOST_WIDE_INT_1U << ((precision & 0x100) >> 16);
> >
> > And this equally doesn't make sense >>16 will shift the masked value out
> > of the bottom of the result.
>
> Yeah, we should keep the original shift amount and just do
> s/1/HOST_WIDE_INT_1U/.
>
> > Also, if we set res.coeffs[1], don't we want refs.coefs[0] to be set to 0?
> >
> > R.
> >
> >>        return res;
> >>      }
> >>    else
> >> -    return (unsigned HOST_WIDE_INT)1 << precision;
> >> +    return HOST_WIDE_INT_1U << precision;
> >>  }
> >>
> >>  /* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
> >> Index: gcc/config/aarch64/aarch64.c
> >> ===================================================================
> >> --- gcc/config/aarch64/aarch64.c     (revision 270418)
> >> +++ gcc/config/aarch64/aarch64.c     (working copy)
> >> @@ -14924,7 +14924,10 @@ aarch64_simd_vector_alignment (const_tree type)
> >>         be set for non-predicate vectors of booleans.  Modes are the most
> >>         direct way we have of identifying real SVE predicate types.  */
> >>      return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 
> >> 128;
> >> -  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
> >> +  tree size = TYPE_SIZE (type);
> >> +  unsigned HOST_WIDE_INT align = 128;
> >> +  if (tree_fits_uhwi_p (size))
> >> +    align = tree_to_uhwi (TYPE_SIZE (type));
> >>    return MIN (align, 128);
>
> I guess this is personal preference, sorry, but:
>
>   return wi::umin (wi::to_wide (TYPE_SIZE (type)), 128).to_uhwi ();
>
> seems more direct and obvious than:
>
>   tree size = TYPE_SIZE (type);
>   unsigned HOST_WIDE_INT align = 128;
>   if (tree_fits_uhwi_p (size))
>     align = tree_to_uhwi (TYPE_SIZE (type));
>   return MIN (align, 128);
>
> OK with those changes, thanks.  I've bootrapped & regression-tested
> it in that form on aarch64-linux-gnu.
>


I'm late in the party, and I confirm that Martin's original patch
introduced regressions:
  Executed from: gcc.target/aarch64/aarch64.exp
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
\\.cfi_def_cfa_register 11 1
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
\\.cfi_escape 0xf,0xc,0x8f,0,0x92,0x2e,0,.* 1
    gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
cmp\\s+x[0-9]+, 61440 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
str\\s+xzr, \\[sp, 0\\] 1
    gcc.target/aarch64/stack-check-prologue-16.c scan-assembler-times
sub\\s+x[0-9]+, x[0-9]+, 61440 1
and many in gcc.target/aarch64/sve/aarch64-sve.exp

Christophe


> Richard

Re: [PATCH] avoid aarch64 ICE on large vectors (PR 89797)

Reply via email to