> On 23 Oct 2024, at 16:49, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> This patch folds svindex with constant arguments into a vector series. >> We implemented this in svindex_impl::fold using the function >> build_vec_series. >> For example, >> svuint64_t f1 () >> { >> return svindex_u642 (10, 3); >> } >> compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...} >> in the gimple pass lower. >> This optimization benefits cases where svindex is used in combination with >> other gimple-level optimizations. >> For example, >> svuint64_t f2 () >> { >> return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5); >> } >> has previously been compiled to >> f2: >> index z0.d, #10, #3 >> mul z0.d, z0.d, #5 >> ret >> Now, it is compiled to >> f2: >> mov x0, 50 >> index z0.d, x0, #15 >> ret > > Nice! Thanks for doing this. > >> For non-constant arguments, build_vec_series produces a VEC_SERIES_EXPR, >> which is translated back at RTL level to an index instruction without codegen >> changes. >> >> We added test cases checking >> - the application of the transform during gimple for constant arguments, >> - the interaction with another gimple-level optimization. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> * config/aarch64/aarch64-sve-builtins-base.cc >> (svindex_impl::fold): Add constant folding. >> >> gcc/testsuite/ >> * gcc.target/aarch64/sve/index_const_fold.c: New test. >> --- >> .../aarch64/aarch64-sve-builtins-base.cc | 12 +++++++ >> .../gcc.target/aarch64/sve/index_const_fold.c | 35 +++++++++++++++++++ >> 2 files changed, 47 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> index 1c17149e1f0..f6b1657ecbb 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> @@ -1304,6 +1304,18 @@ public: >> >> class svindex_impl : public function_base >> { >> +public: >> + gimple * >> + fold (gimple_folder &f) const override >> + { >> + tree vec_type = TREE_TYPE (f.lhs); >> + tree base = gimple_call_arg (f.call, 0); >> + tree step = gimple_call_arg (f.call, 1); > > Could we restrict this to: > > if (TREE_CODE (base) != INTEGER_CST || TREE_CODE (step) != INTEGER_CST) > return nullptr; > > for now? This goes back to the previous discussion about how "creative" > the compiler is allowed to be in replacing the user's original instruction > selection. IIRC, it'd also be somewhat novel to use VEC_SERIES_EXPR for > constant-length vectors. > > We can always relax this later if we find a compelling use case. > But it looks like the tests would still pass with the guard above. > > OK with that change, thanks. Thanks for the review. I added the guard you proposed and pushed to trunk: 90e38c4ffad086a82635e8ea9bf0e7e9e02f1ff7. Best, Jennifer > > Richard > >> + >> + return gimple_build_assign (f.lhs, >> + build_vec_series (vec_type, base, step)); >> + } >> + >> public: >> rtx >> expand (function_expander &e) const override >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c >> b/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c >> new file mode 100644 >> index 00000000000..7abb803f58b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c >> @@ -0,0 +1,35 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-optimized" } */ >> + >> +#include <arm_sve.h> >> +#include <stdint.h> >> + >> +#define INDEX_CONST(TYPE, TY) \ >> + sv##TYPE f_##TY##_index_const () \ >> + { \ >> + return svindex_##TY (10, 3); \ >> + } >> + >> +#define MULT_INDEX(TYPE, TY) \ >> + sv##TYPE f_##TY##_mult_index () \ >> + { \ >> + return svmul_x (svptrue_b8 (), \ >> + svindex_##TY (10, 3), \ >> + 5); \ >> + } >> + >> +#define ALL_TESTS(TYPE, TY) \ >> + INDEX_CONST (TYPE, TY) \ >> + MULT_INDEX (TYPE, TY) >> + >> +ALL_TESTS (uint8_t, u8) >> +ALL_TESTS (uint16_t, u16) >> +ALL_TESTS (uint32_t, u32) >> +ALL_TESTS (uint64_t, u64) >> +ALL_TESTS (int8_t, s8) >> +ALL_TESTS (int16_t, s16) >> +ALL_TESTS (int32_t, s32) >> +ALL_TESTS (int64_t, s64) >> + >> +/* { dg-final { scan-tree-dump-times "return \\{ 10, 13, 16, ... \\}" 8 >> "optimized" } } */ >> +/* { dg-final { scan-tree-dump-times "return \\{ 50, 65, 80, ... \\}" 8 >> "optimized" } } */
smime.p7s
Description: S/MIME cryptographic signature