> > Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> > > SVE's INDEX instruction can be used to populate vectors by values
> > > starting from "base" and incremented by "step" for each subsequent
> > > value. We can take advantage of it to generate vector constants if
> > > TARGET_SVE is available and the base and step values are within [-16, 15].
> > >
> > > For example, with the following function:
> > >
> > > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si
> > > (void) {
> > >   return (v4si){ 0, 1, 2, 3 };
> > > }
> > >
> > > GCC currently generates:
> > >
> > > f_v4si:
> > >   adrp    x0, .LC4
> > >   ldr     q0, [x0, #:lo12:.LC4]
> > >   ret
> > >
> > > .LC4:
> > >   .word   0
> > >   .word   1
> > >   .word   2
> > >   .word   3
> > >
> > > With this patch, we generate an INDEX instruction instead if
> > > TARGET_SVE is available.
> > >
> > > f_v4si:
> > >   index   z0.s, #0, #1
> > >   ret
> > >
> > > [...]
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> > > b/gcc/config/aarch64/aarch64.cc index 9e12bd9711c..01bfb8c52e4
> > > 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op,
> > simd_immediate_info *info,
> > >    if (CONST_VECTOR_P (op)
> > >        && CONST_VECTOR_DUPLICATE_P (op))
> > >      n_elts = CONST_VECTOR_NPATTERNS (op);
> > > -  else if ((vec_flags & VEC_SVE_DATA)
> > > -    && const_vec_series_p (op, &base, &step))
> > > +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))
> >
> > I think we need to check which == AARCH64_CHECK_MOV too.  (Previously
> > that wasn't necessary, because native SVE only uses this routine for
> > moves.)
> >
> > FTR: I was initially a bit nervous about testing TARGET_SVE without
> > looking at vec_flags at all.  But looking at the previous handling of
> > predicates and structures, I agree it looks like the correct thing to do.
> >
> > >      {
> > >        gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> > >        if (!aarch64_sve_index_immediate_p (base) [...] diff --git
> > > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > index 216699b0536..3d6a0160f95 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > @@ -10,7 +10,6 @@ dupq (int x)
> > >    return svdupq_s32 (x, 1, 2, 3);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> > > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
> > >  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } }
> > > */
> > >  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q,
> > > z[0-9]+\.q\[0\]\n} } } */
> > > -/* { dg-final { scan-assembler
> > > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */
> >
> > This seems to be a regression of sorts.  Previously we had:
> >
> >         adrp    x1, .LC0
> >         ldr     q0, [x1, #:lo12:.LC0]
> >         ins     v0.s[0], w0
> >         dup     z0.q, z0.q[0]
> >
> > whereas now we have:
> >
> >         movi    v0.2s, 0x2
> >         index   z31.s, #1, #2
> >         ins     v0.s[0], w0
> >         zip1    v0.4s, v0.4s, v31.4s
> >         dup     z0.q, z0.q[0]
> >
> > I think we should try to aim for:
> >
> >         index   z0.s, #0, #1
> >         ins     v0.s[0], w0
> >         dup     z0.q, z0.q[0]
> >
> > instead.
> 
> Thanks for the feedback, Richard!
> 
> I've added support to handle vectors with non-constant elements. I've split
> that change into a separate patch. Please let me know if you have any
> comments.
> 
> [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX
> instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-patches/2024-
> September/662842.html
> 
> [PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE
> INDEX instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-
> patches/2024-September/662843.html

Just updated [PATCH 2/2] to fix some issue in the test cases. Here's the latest 
patch:
[PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE 
INDEX instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662925.html

Thanks,
Pengxuan
> 
> Thanks,
> Pengxuan
> >
> > > [...]
> > > +/*
> > > +** g_v4si:
> > > +**       index   z0\.s, #3, #\-4
> >
> > The backslash looks redundant here.
> >
> > Thanks,
> > Richard
> >
> > > +**       ret
> > > +*/
> > > +v4si
> > > +g_v4si (void)
> > > +{
> > > +  return (v4si){ 3, -1, -5, -9 };
> > > +}

Reply via email to