Thanks for the comments. I updated the patch accordingly and bootstrapped and tested again. Best, Jennifer
0001-PR-tree-optimization-101390-Vectorize-modulo-operato.patch
Description: Binary data
> On 14 Aug 2024, at 16:18, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> This patch adds a new vectorization pattern that detects the modulo >> operation where the second operand is a variable. >> It replaces the statement by division, multiplication, and subtraction. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> Ok for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> >> PR tree-optimization/101390 >> * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern. >> >> gcc/testsuite/ >> PR tree-optimization/101390 >> * gcc.dg/vect/vect-mod-var.c: New test. >> >> From 15e7abe690ba4f2702b02b29a3198a3309aeb48e Mon Sep 17 00:00:00 2001 >> From: Jennifer Schmitz <jschm...@nvidia.com> >> Date: Wed, 7 Aug 2024 08:56:45 -0700 >> Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator >> >> This patch adds a new vectorization pattern that detects the modulo >> operation where the second operand is a variable. >> It replaces the statement by division, multiplication, and subtraction. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> Ok for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> >> PR tree-optimization/101390 >> * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern. >> >> gcc/testsuite/ >> PR tree-optimization/101390 >> * gcc.dg/vect/vect-mod-var.c: New test. >> --- >> gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 40 ++++++++++++++ >> gcc/tree-vect-patterns.cc | 66 ++++++++++++++++++++++++ >> 2 files changed, 106 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c >> >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c >> b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c >> new file mode 100644 >> index 00000000000..023ee43f51f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c >> @@ -0,0 +1,40 @@ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64-*-* } } >> */ > > Since this is ordinarily a run test (good!), we can't add extra target > flags unless we're sure that the target hardware supports it. Also: > >> + >> +#include "tree-vect.h" >> + >> +#define N 64 >> + >> +__attribute__ ((noinline)) int >> +f (int *restrict a, int *restrict b, int *restrict c) >> +{ >> + for (int i = 0; i < N; ++i) >> + c[i] = a[i] % b[i]; >> +} >> + >> +#define BASE1 -126 >> +#define BASE2 116 >> + >> +int >> +main (void) >> +{ >> + check_vect (); >> + >> + int a[N], b[N], c[N]; >> + >> + for (int i = 0; i < N; ++i) >> + { >> + a[i] = BASE1 + i * 5; >> + b[i] = BASE2 - i * 4; >> + __asm__ volatile (""); >> + } >> + >> + f (a, b, c); >> + >> +#pragma GCC novector >> + for (int i = 0; i < N; ++i) >> + if (c[i] != a[i] % b[i]) >> + __builtin_abort (); >> +} >> + >> +/* { dg-final { scan-tree-dump "vect_recog_mod_var_pattern: detected" >> "vect" } } */ > > ...it looks like this wouldn't pass on non-aarch64 targets. > > I think for the vect testsuite, we should add a new: > > check_effective_target_vect_int_div > > that returns true if [check_effective_target_aarch64_sve]. We can then > drop the dg-additional-options and instead add { target vect_int_div } > to the scan-tree-dump test. We wouldn't need the: > > /* { dg-require-effective-target vect_int } */ > > since the test should pass (without the scan) on all targets. > > This means that the vect testsuite will only cover SVE if the testsuite > is run with SVE enabled. That's usual practice for the vect testsuite > though, and doesn't seem too bad IMO. We should sometimes run the > testsuite with SVE enabled by default if we want get good test coverage > for SVE. > > That said, if we want a test for SVE that runs on all aarch64 targets, > it might be worth adding a compile-only test to gcc.target/aarch64/sve, > say "mod_1.c", in addition to the test above. mod_1.c can then test for > code quality, not just whether vectorisation takes place. It could cover > both 32-bit and 64-bit cases, and both signed and unsigned; see > div_1.c for an example. > >> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc >> index f52de2b6972..8ea31510f6f 100644 >> --- a/gcc/tree-vect-patterns.cc >> +++ b/gcc/tree-vect-patterns.cc >> @@ -5264,6 +5264,71 @@ vect_recog_divmod_pattern (vec_info *vinfo, >> return pattern_stmt; >> } >> >> +/* Detects pattern with a modulo operation (S1) where both arguments >> + are variables of integral type. >> + The statement is replaced by division, multiplication, and subtraction. >> + The last statement (S4) is returned. >> + >> + Example: >> + S1 c_t = a_t % b_t; >> + >> + is replaced by >> + S2 x_t = a_t / b_t; >> + S3 y_t = x_t * b_t; >> + S4 z_t = a_t - y_t; */ > > Minor formatting nit, but: the comment should be indented under the > "Detects". > >> + >> +static gimple * >> +vect_recog_mod_var_pattern (vec_info *vinfo, >> + stmt_vec_info stmt_vinfo, tree *type_out) >> +{ >> + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); >> + tree oprnd0, oprnd1, vectype, itype; >> + gimple *pattern_stmt, *def_stmt; >> + enum tree_code rhs_code; >> + >> + if (!is_gimple_assign (last_stmt)) >> + return NULL; >> + >> + rhs_code = gimple_assign_rhs_code (last_stmt); >> + if (rhs_code != TRUNC_MOD_EXPR) >> + return NULL; >> + >> + oprnd0 = gimple_assign_rhs1 (last_stmt); >> + oprnd1 = gimple_assign_rhs2 (last_stmt); >> + itype = TREE_TYPE (oprnd0); >> + if (TREE_CODE (oprnd0) != SSA_NAME >> + || TREE_CODE (oprnd1) != SSA_NAME >> + || TREE_CODE (itype) != INTEGER_TYPE) >> + return NULL; >> + >> + vectype = get_vectype_for_scalar_type (vinfo, itype); >> + >> + if (!vectype >> + || target_has_vecop_for_code (TRUNC_MOD_EXPR, vectype) >> + || !target_has_vecop_for_code (TRUNC_DIV_EXPR, vectype) >> + || !target_has_vecop_for_code (MULT_EXPR, vectype) >> + || !target_has_vecop_for_code (MINUS_EXPR, vectype)) >> + return NULL; >> + >> + tree q, tmp, r; >> + q = vect_recog_temp_ssa_var (itype, NULL); >> + def_stmt = gimple_build_assign (q, TRUNC_DIV_EXPR, oprnd0, oprnd1); >> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt); > > The calls to append_pattern_def_seq should probably pass vectype as > a fourth parameter, here and below. > > Otherwise the code looks good. One of the things I was wondering > was whether this could introduce new sources of signed overflow in > code that didn't have it previously, but I don't think it can. > > Thanks, > Richard > >> + >> + tmp = vect_recog_temp_ssa_var (itype, NULL); >> + def_stmt = gimple_build_assign (tmp, MULT_EXPR, q, oprnd1); >> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt); >> + >> + r = vect_recog_temp_ssa_var (itype, NULL); >> + pattern_stmt = gimple_build_assign (r, MINUS_EXPR, oprnd0, tmp); >> + >> + /* Pattern detected. */ >> + *type_out = vectype; >> + vect_pattern_detected ("vect_recog_mod_var_pattern", last_stmt); >> + >> + return pattern_stmt; >> +} >> + >> /* Function vect_recog_mixed_size_cond_pattern >> >> Try to find the following pattern: >> @@ -7343,6 +7408,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { >> { vect_recog_rotate_pattern, "rotate" }, >> { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, >> { vect_recog_divmod_pattern, "divmod" }, >> + { vect_recog_mod_var_pattern, "modvar" }, >> { vect_recog_mult_pattern, "mult" }, >> { vect_recog_sat_add_pattern, "sat_add" }, >> { vect_recog_sat_sub_pattern, "sat_sub" },
smime.p7s
Description: S/MIME cryptographic signature