> Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > > We can still use SVE's INDEX instruction to construct vectors even if > > not all elements are constants. For example, { 0, x, 2, 3 } can be > > constructed by first using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, > > and then set the elements which are non-constants separately. > > > > PR target/113328 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback): > > Improve part-variable vector generation with SVE's INDEX if > TARGET_SVE > > is available. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use > > check-function-bodies. > > * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise. > > * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise. > > * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise. > > * gcc.target/aarch64/sve/vec_init_4.c: New test. > > * gcc.target/aarch64/sve/vec_init_5.c: New test. > > > > Signed-off-by: Pengxuan Zheng <quic_pzh...@quicinc.com> > > --- > > gcc/config/aarch64/aarch64.cc | 81 ++++++++++++++++++- > > .../aarch64/sve/acle/general/dupq_1.c | 18 ++++- > > .../aarch64/sve/acle/general/dupq_2.c | 18 ++++- > > .../aarch64/sve/acle/general/dupq_3.c | 18 ++++- > > .../aarch64/sve/acle/general/dupq_4.c | 18 ++++- > > .../gcc.target/aarch64/sve/vec_init_4.c | 47 +++++++++++ > > .../gcc.target/aarch64/sve/vec_init_5.c | 12 +++ > > 7 files changed, 199 insertions(+), 13 deletions(-) create mode > > 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c > > > > diff --git a/gcc/config/aarch64/aarch64.cc > > b/gcc/config/aarch64/aarch64.cc index 6b3ca57d0eb..7305a5c6375 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx > target, rtx vals) > > if (n_var != n_elts) > > { > > rtx copy = copy_rtx (vals); > > + bool is_index_seq = false; > > + > > + /* If at least half of the elements of the vector are constants and > > all > > + these constant elements form a linear sequence of the form { B, B + > S, > > + B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's > > + INDEX instruction if SVE is available and then set the elements which > > + are not constant separately. More precisely, each constant element I > > + has to be B + I * S where B and S must be valid immediate operand > for > > + an SVE INDEX instruction. > > + > > + For example, { X, 1, 2, 3} is a vector satisfying these conditions and > > + we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first > > + and then set the first element of the vector to X. */ > > + > > + if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT > > + && n_var <= n_elts / 2) > > + { > > + int const_idx = -1; > > + HOST_WIDE_INT const_val = 0; > > + int base = 16; > > + int step = 16; > > + > > + for (int i = 0; i < n_elts; ++i) > > + { > > + rtx x = XVECEXP (vals, 0, i); > > + > > + if (!CONST_INT_P (x)) > > + continue; > > + > > + if (const_idx == -1) > > + { > > + const_idx = i; > > + const_val = INTVAL (x); > > + } > > + else > > + { > > + if ((INTVAL (x) - const_val) % (i - const_idx) == 0) > > + { > > + HOST_WIDE_INT s > > + = (INTVAL (x) - const_val) / (i - const_idx); > > + if (s >= -16 && s <= 15) > > + { > > + int b = const_val - s * const_idx; > > + if (b >= -16 && b <= 15) > > + { > > + base = b; > > + step = s; > > + } > > + } > > + } > > + break; > > + } > > + } > > + > > + if (base != 16 > > + && (!CONST_INT_P (v0) > > + || (CONST_INT_P (v0) && INTVAL (v0) == base))) > > + { > > + if (!CONST_INT_P (v0)) > > + XVECEXP (copy, 0, 0) = GEN_INT (base); > > + > > + is_index_seq = true; > > + for (int i = 1; i < n_elts; ++i) > > + { > > + rtx x = XVECEXP (copy, 0, i); > > + > > + if (CONST_INT_P (x)) > > + { > > + if (INTVAL (x) != base + i * step) > > + { > > + is_index_seq = false; > > + break; > > + } > > + } > > + else > > + XVECEXP (copy, 0, i) = GEN_INT (base + i * step); > > + } > > + } > > + } > > This seems a bit more complex than I was hoping for, although the complexity > is probably justified. > > Seeing how awkard it is to do this using current interfaces, I think I'd > instead > prefer to do something that I'd been vaguely hoping to do for a while: extend > vector-builder.h to accept wildcard/don't care values. > finalize () could then replace the wildcards with whatever gives the "nicest" > encoding. > > That's also going to be relatively complex, but I think it'd be more general, > and > might help with the existing vec_init code as well. > It would also be a step towards optimising -1 indices for > __builtin_shufflevector. It might be a few weeks before I can post something > though.
No problem, Richard. I am also curious to see what this alternative implementation looks like. Please kindly keep me posted when your patch is ready. Thank you! > > Pushing 1/2 without 2/2 has meant that the dupq tests will fail in the > meantime, but that's ok. In general, though, it's better not to push > individual > patches from a series unless they've been tested in isolation and are known > to give clean test results. In fact, the dupq tests were not affected. Patch 1/2 already adjusted the "scan-assembler" checks of the dupq tests based on the output of 1/2 alone. Patch 2/2 just replaces the "scan-assembler" checks with "check-function-bodies." So, the dupq tests still pass without 2/2. Thanks, Pengxuan > > Thanks, > Richard