Jennifer Schmitz <jschm...@nvidia.com> writes: > As follow-up to > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, > this patch implements folding of svmul and svdiv by -1 to svneg for > unsigned SVE vector types. The key idea is to reuse the existing code that > does this fold for signed types and feed it as callback to a helper function > that adds the necessary type conversions.
I only meant that we should do this for multiplication (since the sign isn't relevant for N-bit x N-bit -> N-bit multiplication). It wouldn't be right for unsigned division, since unsigned division by the maximum value is instead equivalent to x == MAX ? MAX : 0. Some comments on the multiplication bit below: > > For example, for the test case > svuint64_t foo (svuint64_t x, svbool_t pg) > { > return svmul_n_u64_x (pg, x, -1); > } > > the following gimple sequence is emitted (-O2 -mcpu=grace): > svuint64_t foo (svuint64_t x, svbool_t pg) > { > svuint64_t D.12921; > svint64_t D.12920; > svuint64_t D.12919; > > D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x); > D.12921 = svneg_s64_x (pg, D.12920); > D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921); > goto <D.12922>; > <D.12922>: > return D.12919; > } > > In general, the new helper gimple_folder::convert_and_fold > - takes a target type and a function pointer, > - converts all non-boolean vector types to the target type, > - replaces the converted arguments in the function call, > - calls the callback function, > - adds the necessary view converts to the gimple sequence, > - and returns the new call. > > Because all arguments are converted to the same target types, the helper > function is only suitable for folding calls whose arguments are all of > the same type. If necessary, this could be extended to convert the > arguments to different types differentially. > > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve-builtins-base.cc > (svmul_impl::fold): Wrap code for folding to svneg in lambda > function and pass to gimple_folder::convert_and_fold to enable > the transform for unsigned types. > (svdiv_impl::fold): Likewise. > * config/aarch64/aarch64-sve-builtins.cc > (gimple_folder::convert_and_fold): New function that converts > operands to target type before calling callback function, adding the > necessary conversion statements. > * config/aarch64/aarch64-sve-builtins.h > (gimple_folder::convert_and_fold): Declare function. > (signed_type_suffix_index): Return type_suffix_index of signed > vector type for given width. > (function_instance::signed_type): Return signed vector type for > given width. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected > outcome. > * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust > expected outcome. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 99 ++++++++++++------- > gcc/config/aarch64/aarch64-sve-builtins.cc | 40 ++++++++ > gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++++++ > .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 9 ++ > .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 9 ++ > .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++- > .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- > 9 files changed, 180 insertions(+), 50 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 1c9f515a52c..6df14a8f4c4 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > [...] > @@ -2082,33 +2091,49 @@ public: > return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); > > /* If one of the operands is all integer -1, fold to svneg. */ > - tree pg = gimple_call_arg (f.call, 0); > - tree negated_op = NULL; > - if (integer_minus_onep (op2)) > - negated_op = op1; > - else if (integer_minus_onep (op1)) > - negated_op = op2; > - if (!f.type_suffix (0).unsigned_p && negated_op) > + if (integer_minus_onep (op1) || integer_minus_onep (op2)) Formatting nit, sorry, but: indentation looks off. > { > - function_instance instance ("svneg", functions::svneg, > - shapes::unary, MODE_none, > - f.type_suffix_ids, GROUP_none, f.pred); > - gcall *call = f.redirect_call (instance); > - unsigned offset_index = 0; > - if (f.pred == PRED_m) > + auto mul_by_m1 = [](gimple_folder &f) -> gcall * > { > - offset_index = 1; > - gimple_call_set_arg (call, 0, op1); > - } > - else > - gimple_set_num_ops (call, 5); > - gimple_call_set_arg (call, offset_index, pg); > - gimple_call_set_arg (call, offset_index + 1, negated_op); > - return call; Rather than having convert_and_fold modify the call in-place (which would create an incorrectly typed call if we leave the function itself unchanged), could we make it pass back a "tree" that contains the (possibly converted) lhs and a "vec<tree> &" that contains the (possibly converted) arguments? > + tree pg = gimple_call_arg (f.call, 0); > + tree op1 = gimple_call_arg (f.call, 1); > + tree op2 = gimple_call_arg (f.call, 2); > + tree negated_op = op1; > + bool negate_op1 = true; > + if (integer_minus_onep (op1)) > + { > + negated_op = op2; > + negate_op1 = false; > + } > + type_suffix_pair signed_tsp = > + {signed_type_suffix_index (f.type_suffix (0).element_bits), > + f.type_suffix_ids[1]}; > + function_instance instance ("svneg", functions::svneg, > + shapes::unary, MODE_none, > + signed_tsp, GROUP_none, f.pred); > + gcall *call = f.redirect_call (instance); > + unsigned offset = 0; > + if (f.pred == PRED_m) > + { > + offset = 1; > + tree ty = f.signed_type (f.type_suffix (0).element_bits); > + tree inactive = negate_op1 ? gimple_call_arg (f.call, 1) > + : build_minus_one_cst (ty); > + gimple_call_set_arg (call, 0, inactive); Isn't the old approach of using op1 as the inactive argument still correct? Otherwise it looks good, thanks. Richard > + } > + else > + gimple_set_num_ops (call, 5); > + gimple_call_set_arg (call, offset, pg); > + gimple_call_set_arg (call, offset + 1, negated_op); > + return call; > + }; > + tree ty = f.signed_type (f.type_suffix (0).element_bits); > + return f.convert_and_fold (ty, mul_by_m1); > } > > /* If one of the operands is a uniform power of 2, fold to a left shift > by immediate. */ > + tree pg = gimple_call_arg (f.call, 0); > tree op1_cst = uniform_integer_cst_p (op1); > tree op2_cst = uniform_integer_cst_p (op2); > tree shift_op1, shift_op2 = NULL; > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index 44b7f6edae5..6523a782dac 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -3576,6 +3576,46 @@ gimple_folder::redirect_pred_x () > return redirect_call (instance); > } > > +/* Convert all non-boolean vector-type operands to TYPE, fold the call > + using callback F, and convert the result back to the original type. > + Add the necessary conversion statements. Return the updated call. */ > +gimple * > +gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &)) > +{ > + gcc_assert (VECTOR_TYPE_P (type) > + && TYPE_MODE (type) != VNx16BImode); > + tree old_type = TREE_TYPE (lhs); > + if (useless_type_conversion_p (type, old_type)) > + return fp (*this); > + > + unsigned int num_args = gimple_call_num_args (call); > + gimple_seq stmts = NULL; > + tree op, op_type, t1, t2, t3; > + gimple *g; > + gcall *new_call; > + for (unsigned int i = 0; i < num_args; ++i) > + { > + op = gimple_call_arg (call, i); > + op_type = TREE_TYPE (op); > + if (VECTOR_TYPE_P (op_type) > + && TREE_CODE (op) != VECTOR_CST > + && TYPE_MODE (op_type) != VNx16BImode) > + { > + t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op); > + gimple_call_set_arg (call, i, t1); > + } > + } > + > + new_call = fp (*this); > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > + t2 = create_tmp_var (gimple_call_return_type (new_call)); > + gimple_call_set_lhs (new_call, t2); > + t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2); > + g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3); > + gsi_insert_after (gsi, g, GSI_SAME_STMT); > + return new_call; > +} > + > /* Fold the call to constant VAL. */ > gimple * > gimple_folder::fold_to_cstu (poly_uint64 val) > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h > b/gcc/config/aarch64/aarch64-sve-builtins.h > index d5cc6e0a40d..b48fed0df6b 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -406,6 +406,7 @@ public: > tree scalar_type (unsigned int) const; > tree vector_type (unsigned int) const; > tree tuple_type (unsigned int) const; > + tree signed_type (unsigned int) const; > unsigned int elements_per_vq (unsigned int) const; > machine_mode vector_mode (unsigned int) const; > machine_mode tuple_mode (unsigned int) const; > @@ -630,6 +631,7 @@ public: > > gcall *redirect_call (const function_instance &); > gimple *redirect_pred_x (); > + gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &)); > > gimple *fold_to_cstu (poly_uint64); > gimple *fold_to_pfalse (); > @@ -860,6 +862,20 @@ find_type_suffix (type_class_index tclass, unsigned int > element_bits) > gcc_unreachable (); > } > > +/* Return the type suffix of the signed type of width ELEMENT_BITS. */ > +inline type_suffix_index > +signed_type_suffix_index (unsigned int element_bits) > +{ > + switch (element_bits) > + { > + case 8: return TYPE_SUFFIX_s8; > + case 16: return TYPE_SUFFIX_s16; > + case 32: return TYPE_SUFFIX_s32; > + case 64: return TYPE_SUFFIX_s64; > + } > + gcc_unreachable (); > +} > + > /* Return the single field in tuple type TYPE. */ > inline tree > tuple_type_field (tree type) > @@ -1025,6 +1041,20 @@ function_instance::tuple_type (unsigned int i) const > return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type]; > } > > +/* Return the signed vector type of width ELEMENT_BITS. */ > +inline tree > +function_instance::signed_type (unsigned int element_bits) const > +{ > + switch (element_bits) > + { > + case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t]; > + case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t]; > + case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t]; > + case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t]; > + } > + gcc_unreachable (); > +} > + > /* Return the number of elements of type suffix I that fit within a > 128-bit block. */ > inline unsigned int > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c > index 1e8d6104845..fcadc05d75b 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c > @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t, > z0 = svdiv_n_u32_m (p0, z1, 1), > z0 = svdiv_m (p0, z1, 1)) > > +/* > +** div_m1_u32_m_tied1: > +** neg z0\.s, p0/m, z0\.s > +** ret > +*/ > +TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t, > + z0 = svdiv_n_u32_m (p0, z0, -1), > + z0 = svdiv_m (p0, z0, -1)) > + > /* > ** div_2_u32_m_tied1: > ** lsr z0\.s, p0/m, z0\.s, #1 > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c > index ab049affb63..b95d5af13d8 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c > @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t, > z0 = svdiv_n_u64_m (p0, z1, 1), > z0 = svdiv_m (p0, z1, 1)) > > +/* > +** div_m1_u64_m_tied1: > +** neg z0\.d, p0/m, z0\.d > +** ret > +*/ > +TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t, > + z0 = svdiv_n_u64_m (p0, z0, -1), > + z0 = svdiv_m (p0, z0, -1)) > + > /* > ** div_2_u64_m_tied1: > ** lsr z0\.d, p0/m, z0\.d, #1 > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > index bdf6fcb98d6..e228dc5995d 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t, > > /* > ** mul_m1_u16_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.h, p0/m, z0\.h, \1\.h > +** neg z0\.h, p0/m, z0\.h > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t, > @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t, > > /* > ** mul_m1_u16_x: > -** mul z0\.h, z0\.h, #-1 > +** neg z0\.h, p0/m, z0\.h > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t, > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > index a61e85fa12d..e8f52c9d785 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t, > > /* > ** mul_m1_u32_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.s, p0/m, z0\.s, \1\.s > +** neg z0\.s, p0/m, z0\.s > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t, > @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t, > > /* > ** mul_m1_u32_x: > -** mul z0\.s, z0\.s, #-1 > +** neg z0\.s, p0/m, z0\.s > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t, > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > index eee1f8a0c99..2ccdc3642c5 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t, > > /* > ** mul_m1_u64_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.d, p0/m, z0\.d, \1\.d > +** neg z0\.d, p0/m, z0\.d > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t, > z0 = svmul_n_u64_m (p0, z0, -1), > z0 = svmul_m (p0, z0, -1)) > > +/* > +** mul_m1r_u64_m: > +** mov (z[0-9]+)\.b, #-1 > +** mov (z[0-9]+\.d), z0\.d > +** movprfx z0, \1 > +** neg z0\.d, p0/m, \2 > +** ret > +*/ > +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t, > + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0), > + z0 = svmul_m (p0, svdup_u64 (-1), z0)) > + > /* > ** mul_u64_z_tied1: > ** movprfx z0\.d, p0/z, z0\.d > @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t, > > /* > ** mul_m1_u64_x: > -** mul z0\.d, z0\.d, #-1 > +** neg z0\.d, p0/m, z0\.d > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t, > z0 = svmul_n_u64_x (p0, z0, -1), > z0 = svmul_x (p0, z0, -1)) > > +/* > +** mul_m1r_u64_x: > +** neg z0\.d, p0/m, z0\.d > +** ret > +*/ > +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t, > + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0), > + z0 = svmul_x (p0, svdup_u64 (-1), z0)) > + > /* > ** mul_m127_u64_x: > ** mul z0\.d, z0\.d, #-127 > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > index 06ee1b3e7c8..8e53a4821f0 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t, > > /* > ** mul_m1_u8_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.b, p0/m, z0\.b, \1\.b > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t, > @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t, > > /* > ** mul_255_u8_x: > -** mul z0\.b, z0\.b, #-1 > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, > @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, > > /* > ** mul_m1_u8_x: > -** mul z0\.b, z0\.b, #-1 > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,