> > Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > > > SVE's INDEX instruction can be used to populate vectors by values > > > starting from "base" and incremented by "step" for each subsequent > > > value. We can take advantage of it to generate vector constants if > > > TARGET_SVE is available and the base and step values are within [-16, 15]. > > > > > > For example, with the following function: > > > > > > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si > > > (void) { > > > return (v4si){ 0, 1, 2, 3 }; > > > } > > > > > > GCC currently generates: > > > > > > f_v4si: > > > adrp x0, .LC4 > > > ldr q0, [x0, #:lo12:.LC4] > > > ret > > > > > > .LC4: > > > .word 0 > > > .word 1 > > > .word 2 > > > .word 3 > > > > > > With this patch, we generate an INDEX instruction instead if > > > TARGET_SVE is available. > > > > > > f_v4si: > > > index z0.s, #0, #1 > > > ret > > > > > > [...] > > > diff --git a/gcc/config/aarch64/aarch64.cc > > > b/gcc/config/aarch64/aarch64.cc index 9e12bd9711c..01bfb8c52e4 > > > 100644 > > > --- a/gcc/config/aarch64/aarch64.cc > > > +++ b/gcc/config/aarch64/aarch64.cc > > > @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op, > > simd_immediate_info *info, > > > if (CONST_VECTOR_P (op) > > > && CONST_VECTOR_DUPLICATE_P (op)) > > > n_elts = CONST_VECTOR_NPATTERNS (op); > > > - else if ((vec_flags & VEC_SVE_DATA) > > > - && const_vec_series_p (op, &base, &step)) > > > + else if (TARGET_SVE && const_vec_series_p (op, &base, &step)) > > > > I think we need to check which == AARCH64_CHECK_MOV too. (Previously > > that wasn't necessary, because native SVE only uses this routine for > > moves.) > > > > FTR: I was initially a bit nervous about testing TARGET_SVE without > > looking at vec_flags at all. But looking at the previous handling of > > predicates and structures, I agree it looks like the correct thing to do. > > > > > { > > > gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT); > > > if (!aarch64_sve_index_immediate_p (base) [...] diff --git > > > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c > > > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c > > > index 216699b0536..3d6a0160f95 100644 > > > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c > > > @@ -10,7 +10,6 @@ dupq (int x) > > > return svdupq_s32 (x, 1, 2, 3); > > > } > > > > > > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */ > > > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */ > > > /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } > > > */ > > > /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, > > > z[0-9]+\.q\[0\]\n} } } */ > > > -/* { dg-final { scan-assembler > > > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */ > > > > This seems to be a regression of sorts. Previously we had: > > > > adrp x1, .LC0 > > ldr q0, [x1, #:lo12:.LC0] > > ins v0.s[0], w0 > > dup z0.q, z0.q[0] > > > > whereas now we have: > > > > movi v0.2s, 0x2 > > index z31.s, #1, #2 > > ins v0.s[0], w0 > > zip1 v0.4s, v0.4s, v31.4s > > dup z0.q, z0.q[0] > > > > I think we should try to aim for: > > > > index z0.s, #0, #1 > > ins v0.s[0], w0 > > dup z0.q, z0.q[0] > > > > instead. > > Thanks for the feedback, Richard! > > I've added support to handle vectors with non-constant elements. I've split > that change into a separate patch. Please let me know if you have any > comments. > > [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX > instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-patches/2024- > September/662842.html > > [PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE > INDEX instruction [PR113328] https://gcc.gnu.org/pipermail/gcc- > patches/2024-September/662843.html
Just updated [PATCH 2/2] to fix some issue in the test cases. Here's the latest patch: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662925.html Thanks, Pengxuan > > Thanks, > Pengxuan > > > > > [...] > > > +/* > > > +** g_v4si: > > > +** index z0\.s, #3, #\-4 > > > > The backslash looks redundant here. > > > > Thanks, > > Richard > > > > > +** ret > > > +*/ > > > +v4si > > > +g_v4si (void) > > > +{ > > > + return (v4si){ 3, -1, -5, -9 }; > > > +}