RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

Pengxuan Zheng (QUIC) Tue, 17 Sep 2024 09:40:58 -0700

> Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> > We can still use SVE's INDEX instruction to construct vectors even if
> > not all elements are constants. For example, { 0, x, 2, 3 } can be
> > constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, 3 },
> > and then set the elements which are non-constants separately.
> >
> >     PR target/113328
> >
> > gcc/ChangeLog:
> >
> >     * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
> >     Improve part-variable vector generation with SVE's INDEX if
> TARGET_SVE
> >     is available.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
> >     check-function-bodies.
> >     * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
> >     * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
> >     * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
> >     * gcc.target/aarch64/sve/vec_init_4.c: New test.
> >     * gcc.target/aarch64/sve/vec_init_5.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng <quic_pzh...@quicinc.com>
> > ---
> >  gcc/config/aarch64/aarch64.cc                 | 81 ++++++++++++++++++-
> >  .../aarch64/sve/acle/general/dupq_1.c         | 18 ++++-
> >  .../aarch64/sve/acle/general/dupq_2.c         | 18 ++++-
> >  .../aarch64/sve/acle/general/dupq_3.c         | 18 ++++-
> >  .../aarch64/sve/acle/general/dupq_4.c         | 18 ++++-
> >  .../gcc.target/aarch64/sve/vec_init_4.c       | 47 +++++++++++
> >  .../gcc.target/aarch64/sve/vec_init_5.c       | 12 +++
> >  7 files changed, 199 insertions(+), 13 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 6b3ca57d0eb..7305a5c6375 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> >    if (n_var != n_elts)
> >      {
> >        rtx copy = copy_rtx (vals);
> > +      bool is_index_seq = false;
> > +
> > +      /* If at least half of the elements of the vector are constants and 
> > all
> > +    these constant elements form a linear sequence of the form { B, B +
> S,
> > +    B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
> > +    INDEX instruction if SVE is available and then set the elements which
> > +    are not constant separately.  More precisely, each constant element I
> > +    has to be B + I * S where B and S must be valid immediate operand
> for
> > +    an SVE INDEX instruction.
> > +
> > +    For example, { X, 1, 2, 3} is a vector satisfying these conditions and
> > +    we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
> > +    and then set the first element of the vector to X.  */
> > +
> > +      if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > +     && n_var <= n_elts / 2)
> > +   {
> > +     int const_idx = -1;
> > +     HOST_WIDE_INT const_val = 0;
> > +     int base = 16;
> > +     int step = 16;
> > +
> > +     for (int i = 0; i < n_elts; ++i)
> > +       {
> > +         rtx x = XVECEXP (vals, 0, i);
> > +
> > +         if (!CONST_INT_P (x))
> > +           continue;
> > +
> > +         if (const_idx == -1)
> > +           {
> > +             const_idx = i;
> > +             const_val = INTVAL (x);
> > +           }
> > +         else
> > +           {
> > +             if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
> > +               {
> > +                 HOST_WIDE_INT s
> > +                     = (INTVAL (x) - const_val) / (i - const_idx);
> > +                 if (s >= -16 && s <= 15)
> > +                   {
> > +                     int b = const_val - s * const_idx;
> > +                     if (b >= -16 && b <= 15)
> > +                       {
> > +                         base = b;
> > +                         step = s;
> > +                       }
> > +                   }
> > +               }
> > +             break;
> > +           }
> > +       }
> > +
> > +     if (base != 16
> > +         && (!CONST_INT_P (v0)
> > +             || (CONST_INT_P (v0) && INTVAL (v0) == base)))
> > +       {
> > +         if (!CONST_INT_P (v0))
> > +           XVECEXP (copy, 0, 0) = GEN_INT (base);
> > +
> > +         is_index_seq = true;
> > +         for (int i = 1; i < n_elts; ++i)
> > +           {
> > +             rtx x = XVECEXP (copy, 0, i);
> > +
> > +             if (CONST_INT_P (x))
> > +               {
> > +                 if (INTVAL (x) != base + i * step)
> > +                   {
> > +                     is_index_seq = false;
> > +                     break;
> > +                   }
> > +               }
> > +             else
> > +               XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
> > +           }
> > +       }
> > +   }
> 
> This seems a bit more complex than I was hoping for, although the complexity
> is probably justified.
> 
> Seeing how awkard it is to do this using current interfaces, I think I'd 
> instead
> prefer to do something that I'd been vaguely hoping to do for a while: extend
> vector-builder.h to accept wildcard/don't care values.
> finalize () could then replace the wildcards with whatever gives the "nicest"
> encoding.
> 
> That's also going to be relatively complex, but I think it'd be more general, 
> and
> might help with the existing vec_init code as well.
> It would also be a step towards optimising -1 indices for
> __builtin_shufflevector.  It might be a few weeks before I can post something
> though.


No problem, Richard.

I am also curious to see what this alternative implementation looks like. 
Please kindly keep me posted when your patch is ready. Thank you!

> 
> Pushing 1/2 without 2/2 has meant that the dupq tests will fail in the
> meantime, but that's ok.  In general, though, it's better not to push 
> individual
> patches from a series unless they've been tested in isolation and are known
> to give clean test results.

In fact, the dupq tests were not affected. Patch 1/2 already adjusted the 
"scan-assembler" checks of the dupq tests based on the output of 1/2 alone. 
Patch 2/2 just replaces the "scan-assembler" checks with 
"check-function-bodies." So, the dupq tests still pass without 2/2.

Thanks,
Pengxuan
> 
> Thanks,
> Richard

RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

Reply via email to