Thanks for the comments. I updated the patch accordingly and bootstrapped and 
tested again.
Best, Jennifer

Attachment: 0001-PR-tree-optimization-101390-Vectorize-modulo-operato.patch
Description: Binary data


> On 14 Aug 2024, at 16:18, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> This patch adds a new vectorization pattern that detects the modulo
>> operation where the second operand is a variable.
>> It replaces the statement by division, multiplication, and subtraction.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> Ok for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>> 
>>      PR tree-optimization/101390
>>      * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern.
>> 
>> gcc/testsuite/
>>      PR tree-optimization/101390
>>      * gcc.dg/vect/vect-mod-var.c: New test.
>> 
>> From 15e7abe690ba4f2702b02b29a3198a3309aeb48e Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz <jschm...@nvidia.com>
>> Date: Wed, 7 Aug 2024 08:56:45 -0700
>> Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator
>> 
>> This patch adds a new vectorization pattern that detects the modulo
>> operation where the second operand is a variable.
>> It replaces the statement by division, multiplication, and subtraction.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> Ok for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>> 
>>      PR tree-optimization/101390
>>      * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new pattern.
>> 
>> gcc/testsuite/
>>      PR tree-optimization/101390
>>      * gcc.dg/vect/vect-mod-var.c: New test.
>> ---
>> gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 40 ++++++++++++++
>> gcc/tree-vect-patterns.cc                | 66 ++++++++++++++++++++++++
>> 2 files changed, 106 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c 
>> b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
>> new file mode 100644
>> index 00000000000..023ee43f51f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
>> @@ -0,0 +1,40 @@
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64-*-* } } 
>> */
> 
> Since this is ordinarily a run test (good!), we can't add extra target
> flags unless we're sure that the target hardware supports it.  Also:
> 
>> +
>> +#include "tree-vect.h"
>> +
>> +#define N 64
>> +
>> +__attribute__ ((noinline)) int
>> +f (int *restrict a, int *restrict b, int *restrict c)
>> +{
>> +  for (int i = 0; i < N; ++i)
>> +    c[i] = a[i] % b[i];
>> +}
>> +
>> +#define BASE1 -126
>> +#define BASE2 116
>> +
>> +int
>> +main (void)
>> +{
>> +  check_vect ();
>> +
>> +  int a[N], b[N], c[N];
>> +
>> +  for (int i = 0; i < N; ++i)
>> +    {
>> +      a[i] = BASE1 + i * 5;
>> +      b[i] = BASE2 - i * 4;
>> +      __asm__ volatile ("");
>> +    }
>> +
>> +  f (a, b, c);
>> +
>> +#pragma GCC novector
>> +  for (int i = 0; i < N; ++i)
>> +    if (c[i] != a[i] % b[i])
>> +      __builtin_abort ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "vect_recog_mod_var_pattern: detected" 
>> "vect" } } */
> 
> ...it looks like this wouldn't pass on non-aarch64 targets.
> 
> I think for the vect testsuite, we should add a new:
> 
>  check_effective_target_vect_int_div
> 
> that returns true if [check_effective_target_aarch64_sve].  We can then
> drop the dg-additional-options and instead add { target vect_int_div }
> to the scan-tree-dump test.  We wouldn't need the:
> 
>  /* { dg-require-effective-target vect_int } */
> 
> since the test should pass (without the scan) on all targets.
> 
> This means that the vect testsuite will only cover SVE if the testsuite
> is run with SVE enabled.  That's usual practice for the vect testsuite
> though, and doesn't seem too bad IMO.  We should sometimes run the
> testsuite with SVE enabled by default if we want get good test coverage
> for SVE.
> 
> That said, if we want a test for SVE that runs on all aarch64 targets,
> it might be worth adding a compile-only test to gcc.target/aarch64/sve,
> say "mod_1.c", in addition to the test above.  mod_1.c can then test for
> code quality, not just whether vectorisation takes place.  It could cover
> both 32-bit and 64-bit cases, and both signed and unsigned; see
> div_1.c for an example.
> 
>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>> index f52de2b6972..8ea31510f6f 100644
>> --- a/gcc/tree-vect-patterns.cc
>> +++ b/gcc/tree-vect-patterns.cc
>> @@ -5264,6 +5264,71 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>>   return pattern_stmt;
>> }
>> 
>> +/* Detects pattern with a modulo operation (S1) where both arguments
>> +  are variables of integral type.
>> +  The statement is replaced by division, multiplication, and subtraction.
>> +  The last statement (S4) is returned.
>> +
>> +  Example:
>> +  S1 c_t = a_t % b_t;
>> +
>> +  is replaced by
>> +  S2 x_t = a_t / b_t;
>> +  S3 y_t = x_t * b_t;
>> +  S4 z_t = a_t - y_t;  */
> 
> Minor formatting nit, but: the comment should be indented under the
> "Detects".
> 
>> +
>> +static gimple *
>> +vect_recog_mod_var_pattern (vec_info *vinfo,
>> +                         stmt_vec_info stmt_vinfo, tree *type_out)
>> +{
>> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
>> +  tree oprnd0, oprnd1, vectype, itype;
>> +  gimple *pattern_stmt, *def_stmt;
>> +  enum tree_code rhs_code;
>> +
>> +  if (!is_gimple_assign (last_stmt))
>> +    return NULL;
>> +
>> +  rhs_code = gimple_assign_rhs_code (last_stmt);
>> +  if (rhs_code != TRUNC_MOD_EXPR)
>> +    return NULL;
>> +
>> +  oprnd0 = gimple_assign_rhs1 (last_stmt);
>> +  oprnd1 = gimple_assign_rhs2 (last_stmt);
>> +  itype = TREE_TYPE (oprnd0);
>> +  if (TREE_CODE (oprnd0) != SSA_NAME
>> +      || TREE_CODE (oprnd1) != SSA_NAME
>> +      || TREE_CODE (itype) != INTEGER_TYPE)
>> +    return NULL;
>> +
>> +  vectype = get_vectype_for_scalar_type (vinfo, itype);
>> +
>> +  if (!vectype
>> +      || target_has_vecop_for_code (TRUNC_MOD_EXPR, vectype)
>> +      || !target_has_vecop_for_code (TRUNC_DIV_EXPR, vectype)
>> +      || !target_has_vecop_for_code (MULT_EXPR, vectype)
>> +      || !target_has_vecop_for_code (MINUS_EXPR, vectype))
>> +    return NULL;
>> +
>> +  tree q, tmp, r;
>> +  q = vect_recog_temp_ssa_var (itype, NULL);
>> +  def_stmt = gimple_build_assign (q, TRUNC_DIV_EXPR, oprnd0, oprnd1);
>> +  append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> 
> The calls to append_pattern_def_seq should probably pass vectype as
> a fourth parameter, here and below.
> 
> Otherwise the code looks good.  One of the things I was wondering
> was whether this could introduce new sources of signed overflow in
> code that didn't have it previously, but I don't think it can.
> 
> Thanks,
> Richard
> 
>> +
>> +  tmp = vect_recog_temp_ssa_var (itype, NULL);
>> +  def_stmt = gimple_build_assign (tmp, MULT_EXPR, q, oprnd1);
>> +  append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
>> +
>> +  r = vect_recog_temp_ssa_var (itype, NULL);
>> +  pattern_stmt = gimple_build_assign (r, MINUS_EXPR, oprnd0, tmp);
>> +
>> +  /* Pattern detected.  */
>> +  *type_out = vectype;
>> +  vect_pattern_detected ("vect_recog_mod_var_pattern", last_stmt);
>> +
>> +  return pattern_stmt;
>> +}
>> +
>> /* Function vect_recog_mixed_size_cond_pattern
>> 
>>    Try to find the following pattern:
>> @@ -7343,6 +7408,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>>   { vect_recog_rotate_pattern, "rotate" },
>>   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>>   { vect_recog_divmod_pattern, "divmod" },
>> +  { vect_recog_mod_var_pattern, "modvar" },
>>   { vect_recog_mult_pattern, "mult" },
>>   { vect_recog_sat_add_pattern, "sat_add" },
>>   { vect_recog_sat_sub_pattern, "sat_sub" },


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to