Jennifer Schmitz <jschm...@nvidia.com> writes: > This patch adds a new vectorization pattern that detects the modulo > operation where the second operand is a variable. > It replaces the statement by division, multiplication, and subtraction. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > Ok for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ > > PR tree-optimization/101390 > * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern. > > gcc/testsuite/ > PR tree-optimization/101390 > * gcc.dg/vect/vect-mod-var.c: New test. > > From 15e7abe690ba4f2702b02b29a3198a3309aeb48e Mon Sep 17 00:00:00 2001 > From: Jennifer Schmitz <jschm...@nvidia.com> > Date: Wed, 7 Aug 2024 08:56:45 -0700 > Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator > > This patch adds a new vectorization pattern that detects the modulo > operation where the second operand is a variable. > It replaces the statement by division, multiplication, and subtraction. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > Ok for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ > > PR tree-optimization/101390 > * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern. > > gcc/testsuite/ > PR tree-optimization/101390 > * gcc.dg/vect/vect-mod-var.c: New test. > --- > gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 40 ++++++++++++++ > gcc/tree-vect-patterns.cc | 66 ++++++++++++++++++++++++ > 2 files changed, 106 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c > b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c > new file mode 100644 > index 00000000000..023ee43f51f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c > @@ -0,0 +1,40 @@ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64-*-* } } */
Since this is ordinarily a run test (good!), we can't add extra target flags unless we're sure that the target hardware supports it. Also: > + > +#include "tree-vect.h" > + > +#define N 64 > + > +__attribute__ ((noinline)) int > +f (int *restrict a, int *restrict b, int *restrict c) > +{ > + for (int i = 0; i < N; ++i) > + c[i] = a[i] % b[i]; > +} > + > +#define BASE1 -126 > +#define BASE2 116 > + > +int > +main (void) > +{ > + check_vect (); > + > + int a[N], b[N], c[N]; > + > + for (int i = 0; i < N; ++i) > + { > + a[i] = BASE1 + i * 5; > + b[i] = BASE2 - i * 4; > + __asm__ volatile (""); > + } > + > + f (a, b, c); > + > +#pragma GCC novector > + for (int i = 0; i < N; ++i) > + if (c[i] != a[i] % b[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-tree-dump "vect_recog_mod_var_pattern: detected" "vect" > } } */ ...it looks like this wouldn't pass on non-aarch64 targets. I think for the vect testsuite, we should add a new: check_effective_target_vect_int_div that returns true if [check_effective_target_aarch64_sve]. We can then drop the dg-additional-options and instead add { target vect_int_div } to the scan-tree-dump test. We wouldn't need the: /* { dg-require-effective-target vect_int } */ since the test should pass (without the scan) on all targets. This means that the vect testsuite will only cover SVE if the testsuite is run with SVE enabled. That's usual practice for the vect testsuite though, and doesn't seem too bad IMO. We should sometimes run the testsuite with SVE enabled by default if we want get good test coverage for SVE. That said, if we want a test for SVE that runs on all aarch64 targets, it might be worth adding a compile-only test to gcc.target/aarch64/sve, say "mod_1.c", in addition to the test above. mod_1.c can then test for code quality, not just whether vectorisation takes place. It could cover both 32-bit and 64-bit cases, and both signed and unsigned; see div_1.c for an example. > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index f52de2b6972..8ea31510f6f 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -5264,6 +5264,71 @@ vect_recog_divmod_pattern (vec_info *vinfo, > return pattern_stmt; > } > > +/* Detects pattern with a modulo operation (S1) where both arguments > + are variables of integral type. > + The statement is replaced by division, multiplication, and subtraction. > + The last statement (S4) is returned. > + > + Example: > + S1 c_t = a_t % b_t; > + > + is replaced by > + S2 x_t = a_t / b_t; > + S3 y_t = x_t * b_t; > + S4 z_t = a_t - y_t; */ Minor formatting nit, but: the comment should be indented under the "Detects". > + > +static gimple * > +vect_recog_mod_var_pattern (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, tree *type_out) > +{ > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > + tree oprnd0, oprnd1, vectype, itype; > + gimple *pattern_stmt, *def_stmt; > + enum tree_code rhs_code; > + > + if (!is_gimple_assign (last_stmt)) > + return NULL; > + > + rhs_code = gimple_assign_rhs_code (last_stmt); > + if (rhs_code != TRUNC_MOD_EXPR) > + return NULL; > + > + oprnd0 = gimple_assign_rhs1 (last_stmt); > + oprnd1 = gimple_assign_rhs2 (last_stmt); > + itype = TREE_TYPE (oprnd0); > + if (TREE_CODE (oprnd0) != SSA_NAME > + || TREE_CODE (oprnd1) != SSA_NAME > + || TREE_CODE (itype) != INTEGER_TYPE) > + return NULL; > + > + vectype = get_vectype_for_scalar_type (vinfo, itype); > + > + if (!vectype > + || target_has_vecop_for_code (TRUNC_MOD_EXPR, vectype) > + || !target_has_vecop_for_code (TRUNC_DIV_EXPR, vectype) > + || !target_has_vecop_for_code (MULT_EXPR, vectype) > + || !target_has_vecop_for_code (MINUS_EXPR, vectype)) > + return NULL; > + > + tree q, tmp, r; > + q = vect_recog_temp_ssa_var (itype, NULL); > + def_stmt = gimple_build_assign (q, TRUNC_DIV_EXPR, oprnd0, oprnd1); > + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt); The calls to append_pattern_def_seq should probably pass vectype as a fourth parameter, here and below. Otherwise the code looks good. One of the things I was wondering was whether this could introduce new sources of signed overflow in code that didn't have it previously, but I don't think it can. Thanks, Richard > + > + tmp = vect_recog_temp_ssa_var (itype, NULL); > + def_stmt = gimple_build_assign (tmp, MULT_EXPR, q, oprnd1); > + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt); > + > + r = vect_recog_temp_ssa_var (itype, NULL); > + pattern_stmt = gimple_build_assign (r, MINUS_EXPR, oprnd0, tmp); > + > + /* Pattern detected. */ > + *type_out = vectype; > + vect_pattern_detected ("vect_recog_mod_var_pattern", last_stmt); > + > + return pattern_stmt; > +} > + > /* Function vect_recog_mixed_size_cond_pattern > > Try to find the following pattern: > @@ -7343,6 +7408,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { > { vect_recog_rotate_pattern, "rotate" }, > { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, > { vect_recog_divmod_pattern, "divmod" }, > + { vect_recog_mod_var_pattern, "modvar" }, > { vect_recog_mult_pattern, "mult" }, > { vect_recog_sat_add_pattern, "sat_add" }, > { vect_recog_sat_sub_pattern, "sat_sub" },