Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Tuesday, December 14, 2021 12:38 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
>> Subject: Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD
>> structure types [PR103094]
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > The new partial and full vector types added to AArch64, e.g.
>> >
>> > int8x8x2_t with mode V2x8QI are incorrectly being defined as being
>> > short vectors and not being composite types.
>> >
>> > This causes the layout code to incorrectly conclude that the registers
>> > are packed. i.e. for V2x8QI it thinks those 16-bytes are in the same 
>> > registers.
>> >
>> > Because of this the code under !aarch64_composite_type_p is
>> > unreachable but also lacked any extra checks to see that nregs is what we
>> expected it to be.
>> >
>> > I have also updated aarch64_advsimd_full_struct_mode_p and
>> > aarch64_advsimd_partial_struct_mode_p to only consider vector types as
>> > struct modes.  Otherwise types such as OImode and friends would
>> > qualify leading to incorrect results.
>> 
>> How easy would it be to fix the bug without doing this last bit?
>> The idea was that OI, CI and XI should continue to be structure modes until
>> we remove them.  aarch64_advsimd_partial_struct_mode_p
>> and aarch64_advsimd_full_struct_mode_p are meant to be convenience
>> wrappers and so they shouldn't make different decisions from the
>> underlying aarch64_classify_vector_mode.
>
> It can be done by moving the check higher in callers of these functions, but 
> the problem is that
> With an e.g. an OImode there's no real indication of how many registers are 
> used to create the
> IOmode. It could be 4, 6, 8 as it's just a bag of bits.

OImode is always 2 Q registers, etc.

Which bit of code are you concerned about?  Is it the parts where
we generate gen_rtx_REG?  If so, it was the case even before the
new modes that an OImode structure could have safely been classified
as (reg:OI V0) (say) rather than as a less efficient parallel.

Thanks,
Richard


>
> My concern is that these functions are misleading without this, with any of 
> these opaque
> types returning true for both of these functions it becomes harder to make 
> decisions between
> the two, in particular because we still expand to these modes for certain 
> structures.
>
>> 
>> >
>> > This patch fixes up the issues and we now generate correct code.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> >    PR target/103094
>> >    * config/aarch64/aarch64.c (aarch64_function_value,
>> aarch64_layout_arg):
>> >    Fix unreachable code for partial vectors and re-order switch to
>> perform
>> >    the simplest test first.
>> >    (aarch64_short_vector_p): Mark as not short vectors.
>> >    (aarch64_composite_type_p): Mark as composite types.
>> >    (aarch64_advsimd_partial_struct_mode_p,
>> >    aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >    PR target/103094
>> >    * gcc.target/aarch64/pr103094.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.c
>> > b/gcc/config/aarch64/aarch64.c index
>> >
>> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b872
>> 5d
>> > 05be4fd6468f 100644
>> > --- a/gcc/config/aarch64/aarch64.c
>> > +++ b/gcc/config/aarch64/aarch64.c
>> > @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p
>> (machine_mode
>> > mode)  static bool  aarch64_advsimd_partial_struct_mode_p
>> > (machine_mode mode)  {
>> > -  return (aarch64_classify_vector_mode (mode)
>> > -    == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>> > +  return VECTOR_MODE_P (mode)
>> > +   && (aarch64_classify_vector_mode (mode)
>> > +          == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>> >  }
>> >
>> >  /* Return true if MODE is an Advanced SIMD Q-register structure mode.
>> > */  static bool  aarch64_advsimd_full_struct_mode_p (machine_mode
>> > mode)  {
>> > -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
>> > VEC_STRUCT));
>> > +  return VECTOR_MODE_P (mode)
>> > +   && (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
>> > +VEC_STRUCT));
>> >  }
>> >
>> >  /* Return true if MODE is any of the data vector modes, including @@
>> > -6468,17 +6470,21 @@ aarch64_function_value (const_tree type,
>> const_tree func,
>> >                                           NULL, false))
>> >      {
>> >        gcc_assert (!sve_p);
>> > -      if (!aarch64_composite_type_p (type, mode))
>> > +      if (aarch64_advsimd_full_struct_mode_p (mode))
>> > +  {
>> > +    gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16),
>> count));
>> > +    return gen_rtx_REG (mode, V0_REGNUM);
>> > +  }
>> > +      else if (aarch64_advsimd_partial_struct_mode_p (mode))
>> > +  {
>> > +    gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8),
>> count));
>> > +    return gen_rtx_REG (mode, V0_REGNUM);
>> > +  }
>> > +      else if (!aarch64_composite_type_p (type, mode))
>> >    {
>> >      gcc_assert (count == 1 && mode == ag_mode);
>> >      return gen_rtx_REG (mode, V0_REGNUM);
>> >    }
>> > -      else if (aarch64_advsimd_full_struct_mode_p (mode)
>> > -         && known_eq (GET_MODE_SIZE (ag_mode), 16))
>> > -  return gen_rtx_REG (mode, V0_REGNUM);
>> > -      else if (aarch64_advsimd_partial_struct_mode_p (mode)
>> > -         && known_eq (GET_MODE_SIZE (ag_mode), 8))
>> > -  return gen_rtx_REG (mode, V0_REGNUM);
>> >        else
>> >    {
>> >      int i;
>> > @@ -6745,6 +6751,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v,
>> const function_arg_info &arg)
>> >      /* No frontends can create types with variable-sized modes, so we
>> >         shouldn't be asked to pass or return them.  */
>> >      size = GET_MODE_SIZE (mode).to_constant ();
>> > +
>> >    size = ROUND_UP (size, UNITS_PER_WORD);
>> >
>> >    allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P
>> > (mode); @@ -6769,17 +6776,21 @@ aarch64_layout_arg
>> (cumulative_args_t pcum_v, const function_arg_info &arg)
>> >        if (nvrn + nregs <= NUM_FP_ARG_REGS)
>> >    {
>> >      pcum->aapcs_nextnvrn = nvrn + nregs;
>> > -    if (!aarch64_composite_type_p (type, mode))
>> > +    if (aarch64_advsimd_full_struct_mode_p (mode))
>> > +      {
>> > +        gcc_assert (nregs == size / 16);
>> > +        pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>> > +      }
>> > +    else if (aarch64_advsimd_partial_struct_mode_p (mode))
>> > +      {
>> > +        gcc_assert (nregs == size / 8);
>> > +        pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>> > +      }
>> > +    else if (!aarch64_composite_type_p (type, mode))
>> >        {
>> >          gcc_assert (nregs == 1);
>> >          pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>> >        }
>> > -    else if (aarch64_advsimd_full_struct_mode_p (mode)
>> > -             && known_eq (GET_MODE_SIZE (pcum-
>> >aapcs_vfp_rmode), 16))
>> > -      pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>> > -    else if (aarch64_advsimd_partial_struct_mode_p (mode)
>> > -             && known_eq (GET_MODE_SIZE (pcum-
>> >aapcs_vfp_rmode), 8))
>> > -      pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
>> >      else
>> >        {
>> >          rtx par;
>> > @@ -19285,6 +19296,13 @@ aarch64_short_vector_p (const_tree type,
>> >        else
>> >    size = GET_MODE_SIZE (mode);
>> >      }
>> > +
>> > +  /* If a Advanced SIMD partial or full aggregate vector type we aren't a
>> short
>> > +     type.  */
>> > +  if (aarch64_advsimd_partial_struct_mode_p (mode)
>> > +      || aarch64_advsimd_full_struct_mode_p (mode))
>> > +    return false;
>> > +
>> >    if (known_eq (size, 8) || known_eq (size, 16))
>> >      {
>> >        /* 64-bit and 128-bit vectors should only acquire an SVE mode
>> > if
>> 
>> I think the bug here is that we trust the mode even if we're given a
>> conflicting type.  In principle it would be OK to use, say, V4SI for an 
>> array of 4
>> ints, but that shouldn't suddenly make aarch64_short_vector_p true.
>> 
>> Unfortunately that ship has sailed, so we e.g. treat:
>> 
>>   struct wrapper { int32x4_t x; int :0; };
>> 
>> as a short vector too.
>> 
>> So it feels like this a case of limiting the contagion and that the check 
>> should
>> go in here:
>> 
>>   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>         || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
>>     {
>>       /* Rely only on the type, not the mode, when processing SVE types.  */
>>       if (type && aarch64_some_values_include_pst_objects_p (type))
>>      /* Leave later code to report an error if SVE is disabled.  */
>>      gcc_assert (!TARGET_SVE || aarch64_sve_mode_p (mode));
>>       else
>>      size = GET_MODE_SIZE (mode);
>>     }
>> 
>> where we needed similar protection for SVE.  E.g. we could change the inner
>> else to:
>
> Indeed, I did see that for SVE we use the types instead of the modes, but the
> types are not passed to all functions. So this would get these to return a 
> different
> nregs than what e.g. aarch64_layout_arg calculates itself.  Of course I can 
> remove
> the asserts but I think they're useful in catching issues like these.
>
> I can also just change all that code to use type instead.
>
>> 
>>       else if (!aarch64_advsimd_struct_mode_p (mode))
>> 
>> or keep it is an early-out (but within the outer “else if”) if that seems 
>> clearer.
>> 
>> > @@ -19316,6 +19334,12 @@ static bool
>> >  aarch64_composite_type_p (const_tree type,
>> >                      machine_mode mode)
>> >  {
>> > +  /* If a Advanced SIMD partial or full aggregate vector type we are a
>> > +     composite type.  */
>> > +  if (aarch64_advsimd_partial_struct_mode_p (mode)
>> > +      || aarch64_advsimd_full_struct_mode_p (mode))
>> > +    return true;
>> > +
>> 
>> Isn't this naturally true after the fix to aarch64_short_vector_p?
>> It would be good to avoid adding new “mode only” tests if we can help it.
>
> Yes but you can call this function directly and it should still return the 
> right
> value for the new struct modes. 
>
>> 
>> Also, the old code didn't handle OI, CI or XI specially here, so doing
>> something different now might be dangerous.
>
> This shouldn't change the handling of OI mode and friends though. Since they 
> would
> all return false here and fall through to the old code.  It's only 
> problematic if these new
> convenience functions don't exclude OI and other non-vector modes.
>
> So this should only change the behaviour for actual structure modes.  But as 
> you say,
>  I can look at the types, though my concern is that there's technically 
> nothing stopping
> an expand pattern from expanding to OImode with a structure "type", in which 
> case
> inspecting the type will change the behavior whereas the mode is a bit safer 
> until we
> remove the other modes entirely.
>
> But happy to rewrite it to use the type instead if that's preferred. 
>
> Cheers,
> Tamar
>
>> 
>> Thanks,
>> Richard
>> 
>> >    if (aarch64_short_vector_p (type, mode))
>> >      return false;
>> >
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr103094.c
>> > b/gcc/testsuite/gcc.target/aarch64/pr103094.c
>> > new file mode 100644
>> > index
>> >
>> 0000000000000000000000000000000000000000..441e602928ce8ac4e9890a137
>> 6ac
>> > bc25671e284d
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/pr103094.c
>> > @@ -0,0 +1,21 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-additional-options "-fdump-rtl-expand -w" } */
>> > +
>> > +#include <arm_neon.h>
>> > +
>> > +void foo (uint8x8x2_t cols_01_23, uint8x8x2_t cols_45_67, uint16_t*
>> > +outptr0) {
>> > +  uint16x4x4_t cols_01_23_45_67 = { {
>> > +    vreinterpret_u16_u8(cols_01_23.val[0]),
>> > +    vreinterpret_u16_u8(cols_01_23.val[1]),
>> > +    vreinterpret_u16_u8(cols_45_67.val[0]),
>> > +    vreinterpret_u16_u8(cols_45_67.val[1])
>> > +  } };
>> > +
>> > +  vst4_lane_u16(outptr0, cols_01_23_45_67, 0); }
>> > +
>> > +/* Check that we expand to v0 and v2 from the function arguments.  */
>> > +/* { dg-final { scan-rtl-dump {\(reg:V2x8QI \d+ v0 \[ cols_01_23
>> > +\]\)} expand } } */
>> > +/* { dg-final { scan-rtl-dump {\(reg:V2x8QI \d+ v2 \[ cols_45_67
>> > +\]\)} expand } } */
>> > +

Reply via email to