> On 30 Aug 2024, at 14:17, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> This patch implements constant folding for svdiv. If the predicate is >> ptrue or predication is _x, it uses vector_const_binop with >> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant >> integer operands. >> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0 >> for division by 0, as defined in the semantics for svdiv. >> Tests were added to check the produced assembly for different >> predicates, signed and unsigned integers, and the svdiv_n_* case. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): >> Try constant folding. >> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop): >> Add special case for division by 0. >> >> gcc/testsuite/ >> * gcc.target/aarch64/sve/const_fold_div_1.c: New test. >> >> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001 >> From: Jennifer Schmitz <jschm...@nvidia.com> >> Date: Thu, 29 Aug 2024 05:04:51 -0700 >> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv. >> >> This patch implements constant folding for svdiv. If the predicate is >> ptrue or predication is _x, it uses vector_const_binop with >> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant >> integer operands. >> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0 >> for division by 0, as defined in the semantics for svdiv. >> Tests were added to check the produced assembly for different >> predicates, signed and unsigned integers, and the svdiv_n_* case. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): >> Try constant folding. >> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop): >> Add special case for division by 0. >> >> gcc/testsuite/ >> * gcc.target/aarch64/sve/const_fold_div_1.c: New test. >> --- >> .../aarch64/aarch64-sve-builtins-base.cc | 19 +- >> gcc/config/aarch64/aarch64-sve-builtins.cc | 4 + >> .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++ >> 3 files changed, 356 insertions(+), 3 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c >> >> diff --git >> a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> index d55bee0b72f..617c7fc87e5 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> @@ -755,8 +755,21 @@ public: >> gimple * >> fold (gimple_folder &f) const override >> { >> - tree divisor = gimple_call_arg (f.call, 2); >> - tree divisor_cst = uniform_integer_cst_p (divisor); >> + tree pg = gimple_call_arg (f.call, 0); >> + tree op1 = gimple_call_arg (f.call, 1); >> + tree op2 = gimple_call_arg (f.call, 2); >> + >> + /* Try to fold constant integer operands. */ >> + if (f.type_suffix (0).integer_p >> + && (f.pred == PRED_x >> + || is_ptrue (pg, f.type_suffix (0).element_bytes))) >> + if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2, >> + aarch64_const_binop)) >> + return gimple_build_assign (f.lhs, res); > > To reduce cut-&-paste, it'd be good to put this in a helper: > > gimple *gimple_folder::fold_const_binary (tree_code code); > > that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR. > It could return null on failure. Then the caller can just be: > > if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR)) > return res; > > This could go right at the top of the function, since it doesn't rely > on any of the local variables above. Done. > >> + >> + /* If the divisor is a uniform power of 2, fold to a shift >> + instruction. */ >> + tree divisor_cst = uniform_integer_cst_p (op2); >> >> if (!divisor_cst || !integer_pow2p (divisor_cst)) >> return NULL; >> @@ -770,7 +783,7 @@ public: >> shapes::binary_uint_opt_n, MODE_n, >> f.type_suffix_ids, GROUP_none, f.pred); >> call = f.redirect_call (instance); >> - tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst; >> + tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst; >> new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d)); >> } >> else >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc >> b/gcc/config/aarch64/aarch64-sve-builtins.cc >> index 315d5ac4177..c1b28ebfe4e 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc >> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc >> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, >> tree arg2) >> signop sign = TYPE_SIGN (type); >> wi::overflow_type overflow = wi::OVF_NONE; >> >> + /* Return 0 for division by 0. */ >> + if (code == TRUNC_DIV_EXPR && integer_zerop (arg2)) >> + return arg2; >> + >> if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow)) >> return NULL_TREE; >> return force_fit_type (type, poly_res, false, >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c >> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c >> new file mode 100644 >> index 00000000000..062fb6e560e >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c >> @@ -0,0 +1,336 @@ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "arm_sve.h" >> + >> +/* >> +** s64_x_pg: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_x_pg (svbool_t pg) >> +{ >> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_x_pg_0: >> +** mov z[0-9]+\.b, #0 >> +** ret >> +*/ >> +svint64_t s64_x_pg_0 (svbool_t pg) >> +{ >> + return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_x_pg_by0: >> +** mov z[0-9]+\.b, #0 >> +** ret >> +*/ >> +svint64_t s64_x_pg_by0 (svbool_t pg) >> +{ >> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0)); >> +} >> + >> +/* >> +** s64_z_pg: >> +** mov z[0-9]+\.d, p[0-7]/z, #1 >> +** ret >> +*/ >> +svint64_t s64_z_pg (svbool_t pg) >> +{ >> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_z_pg_0: >> +** mov z[0-9]+\.d, p[0-7]/z, #0 >> +** ret >> +*/ >> +svint64_t s64_z_pg_0 (svbool_t pg) >> +{ >> + return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_z_pg_by0: >> +** mov (z[0-9]+\.d), #5 >> +** mov (z[0-9]+)\.b, #0 >> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1 >> +** ret >> +*/ >> +svint64_t s64_z_pg_by0 (svbool_t pg) >> +{ >> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0)); >> +} >> + >> +/* >> +** s64_m_pg: >> +** mov (z[0-9]+\.d), #3 >> +** mov (z[0-9]+\.d), #5 >> +** sdiv \2, p[0-7]/m, \2, \1 >> +** ret >> +*/ >> +svint64_t s64_m_pg (svbool_t pg) >> +{ >> + return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_x_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_x_ptrue () >> +{ >> + return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_z_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_z_ptrue () >> +{ >> + return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_m_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_m_ptrue () >> +{ >> + return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3)); >> +} >> + >> +/* >> +** s64_x_pg_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_x_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_s64_x (pg, svdup_s64 (5), 3); >> +} >> + >> +/* >> +** s64_x_pg_n_s64_0: >> +** mov z[0-9]+\.b, #0 >> +** ret >> +*/ >> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg) >> +{ >> + return svdiv_n_s64_x (pg, svdup_s64 (0), 3); >> +} >> + >> +/* >> +** s64_x_pg_n_s64_by0: >> +** mov z[0-9]+\.b, #0 >> +** ret >> +*/ >> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg) >> +{ >> + return svdiv_n_s64_x (pg, svdup_s64 (5), 0); >> +} >> + >> +/* >> +** s64_z_pg_n: >> +** mov z[0-9]+\.d, p[0-7]/z, #1 >> +** ret >> +*/ >> +svint64_t s64_z_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_s64_z (pg, svdup_s64 (5), 3); >> +} >> + >> +/* >> +** s64_z_pg_n_s64_0: >> +** mov z[0-9]+\.d, p[0-7]/z, #0 >> +** ret >> +*/ >> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg) >> +{ >> + return svdiv_n_s64_z (pg, svdup_s64 (0), 3); >> +} >> + >> +/* >> +** s64_z_pg_n_s64_by0: >> +** mov (z[0-9]+\.d), #5 >> +** mov (z[0-9]+)\.b, #0 >> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1 >> +** ret >> +*/ >> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg) >> +{ >> + return svdiv_n_s64_z (pg, svdup_s64 (5), 0); >> +} >> + >> +/* >> +** s64_m_pg_n: >> +** mov (z[0-9]+\.d), #3 >> +** mov (z[0-9]+\.d), #5 >> +** sdiv \2, p[0-7]/m, \2, \1 >> +** ret >> +*/ >> +svint64_t s64_m_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_s64_m (pg, svdup_s64 (5), 3); >> +} >> + >> +/* >> +** s64_x_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_x_ptrue_n () >> +{ >> + return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3); >> +} >> + >> +/* >> +** s64_z_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_z_ptrue_n () >> +{ >> + return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3); >> +} >> + >> +/* >> +** s64_m_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svint64_t s64_m_ptrue_n () >> +{ >> + return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3); >> +} >> + >> +/* >> +** u64_x_pg: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_x_pg (svbool_t pg) >> +{ >> + return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_z_pg: >> +** mov z[0-9]+\.d, p[0-7]/z, #1 >> +** ret >> +*/ >> +svuint64_t u64_z_pg (svbool_t pg) >> +{ >> + return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_m_pg: >> +** mov (z[0-9]+\.d), #3 >> +** mov (z[0-9]+\.d), #5 >> +** udiv \2, p[0-7]/m, \2, \1 >> +** ret >> +*/ >> +svuint64_t u64_m_pg (svbool_t pg) >> +{ >> + return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_x_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_x_ptrue () >> +{ >> + return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_z_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_z_ptrue () >> +{ >> + return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_m_ptrue: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_m_ptrue () >> +{ >> + return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3)); >> +} >> + >> +/* >> +** u64_x_pg_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_x_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_u64_x (pg, svdup_u64 (5), 3); >> +} >> + >> +/* >> +** u64_z_pg_n: >> +** mov z[0-9]+\.d, p[0-7]/z, #1 >> +** ret >> +*/ >> +svuint64_t u64_z_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_u64_z (pg, svdup_u64 (5), 3); >> +} >> + >> +/* >> +** u64_m_pg_n: >> +** mov (z[0-9]+\.d), #3 >> +** mov (z[0-9]+\.d), #5 >> +** udiv \2, p[0-7]/m, \2, \1 >> +** ret >> +*/ >> +svuint64_t u64_m_pg_n (svbool_t pg) >> +{ >> + return svdiv_n_u64_m (pg, svdup_u64 (5), 3); >> +} >> + >> +/* >> +** u64_x_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_x_ptrue_n () >> +{ >> + return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3); >> +} >> + >> +/* >> +** u64_z_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_z_ptrue_n () >> +{ >> + return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3); >> +} >> + >> +/* >> +** u64_m_ptrue_n: >> +** mov z[0-9]+\.d, #1 >> +** ret >> +*/ >> +svuint64_t u64_m_ptrue_n () >> +{ >> + return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3); >> +} > > These are good tests, but maybe we could throw in a small number > of svdupq tests as well, to test for non-uniform cases. E.g.: > > svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, > -6, 0)); > > which hopefully should get optimised to zero. > > Similarly: > > svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, > 15, -50, 2)); > > should get optimised to -2. > > Looks good to me otherwise. > > Thanks, > Richard Thanks for suggesting these tests, I added them to the test file. Best, Jennifer
0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch
Description: Binary data
smime.p7s
Description: S/MIME cryptographic signature