Hi Pan, > -----Original Message----- > From: Li, Pan2 <pan2...@intel.com> > Sent: Tuesday, August 20, 2024 1:58 AM > To: Tamar Christina <tamar.christ...@arm.com>; Jakub Jelinek > <ja...@redhat.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; > rdapp....@gmail.com; Liu, Hongtao <hongtao....@intel.com> > Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > Thanks Jakub and Tamar for comments and suggestions. > > The match.pd list as below doesn't check the INT_CST type for .SAT_ADD. > > (match (unsigned_integer_sat_add @0 @1) > (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) > integer_zerop) > integer_minus_onep (realpart @2)) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > && types_match (type, @0)))) > > Thus the different types of .ADD_OVERFLOW could hit the pattern. The > vectorizable_call strictly > check the operands are totally the same, while the scalar doesn't have similar > check. That > is why I only found this issue from vector part.
Yeah and my question was more why are we getting away with it for the scalar. So I implemented the optabs so I can take a look. It looks like the scalar version doesn't match because split-path rewrites the IL when the argument is a constant. Passing -fno-split-paths gets it to generate the instruction where we see that the IFN will then also contain mixed types. #include <stdint.h> #include <stdio.h> #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ { \ unsigned i; \ T ret; \ for (i = 0; i < limit; i++) \ { \ out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ } \ } #define CST -9LL DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, CST) int main () { uint8_t a = 1; uint8_t r = 0; vec_sat_u_add_immCST_uint8_t_fmt_3 (&r, &a, 1); printf ("r=%u\n", r); } Generates: movi v31.8b, 0xfffffffffffffff7 uxtw x2, w2 mov x3, 0 .p2align 5,,15 .L3: ldr b30, [x1, x3] uqadd b30, b30, b31 str b30, [x0, x3] add x3, x3, 1 cmp x2, x3 bne .L3 which is incorrect, it's expected to saturate but instead is doing x + 0xF7. This is because of what Richi said before, there's nothing else in GIMPLE that tries to validate the operands, and expand will simply force the operand to the register of the size it requested and doesn't care about the outcome. For constants that are out of range, we're getting lucky in that existing math rules will remove the operation and replace It with -1. Because the operations know it would overflow in this case so at compile time the check goes away. That's why The problem doesn't show up with an out of range constant since there's no saturation check anymore. But this is pure luck. Secondly the reason I said that +static void +vect_recog_promote_cst_to_unsigned (tree *op, tree type) +{ + if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type)) + return; + + unsigned precision = TYPE_PRECISION (type); + wide_int type_max = wi::mask (precision, false, precision); + wide_int op_cst_val = wi::to_wide (*op, precision); + + if (wi::leu_p (op_cst_val, type_max)) + *op = wide_int_to_tree (type, op_cst_val); +} + Is wrong is that patterns are one way. If the operand is invalid *op isn't rewritten so it stays the same as it is today. That is it'll get rejected in vectorizable_call. But because you've now matched here, another pattern can't match anymore, and more importantly, it prevents is from trying any alternative way to vectorize this (if there was one). That's why the pattern matcher shouldn't knowingly accept something we know can't get vectorized. You shouldn't build the pattern at all. And the reason I suggested doing this check in the match.pd is because of an inconsistency between the variable and immediate variant if it's not done there. You have (match (unsigned_integer_sat_add @0 @1) (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop) integer_minus_onep (realpart @2)) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0)))) And (match (unsigned_integer_sat_add @0 @1) (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop) integer_minus_onep (usadd_left_part_2 @0 @1))) Where (match (usadd_left_part_2 @0 @1) (realpart (IFN_ADD_OVERFLOW:c @0 @1)) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1)))) so for the non-constant version we require all the operands to be unsigned and of the same type. For the constant version we don't.. This seems a bit inconsistent to me, along with for the non-overflow case: (match (unsigned_integer_sat_add @0 @1) (plus (min @0 INTEGER_CST@2) INTEGER_CST@1) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1)) There is a check on the sign of the constant. So it seems to me that it's the non-constant case for the IFN_ADD_OVERFLOW case that's the only one lacking additional validation on the constant. Thanks, Tamar > > It looks like we need to add the type check for INT_CST in match.pd predicate > and > then add explicit cast > to IMM from the source code to match the pattern. For example as below, not > very > sure it is reasonable or not. > > #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ > T __attribute__((noinline)) \ > vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ > { \ > unsigned i; \ > T ret; \ > for (i = 0; i < limit; i++) \ > { \ > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ // > need add > (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 : ret; > to hit > the pattern. > } \ > } > > Pan > > -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: Tuesday, August 20, 2024 3:41 AM > To: Jakub Jelinek <ja...@redhat.com> > Cc: Li, Pan2 <pan2...@intel.com>; Richard Biener <richard.guent...@gmail.com>; > gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp....@gmail.com; Liu, Hongtao > <hongtao....@intel.com> > Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > vectorizable_call > > > -----Original Message----- > > From: Jakub Jelinek <ja...@redhat.com> > > Sent: Monday, August 19, 2024 8:25 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: Li, Pan2 <pan2...@intel.com>; Richard Biener > > <richard.guent...@gmail.com>; > > gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > > jeffreya...@gmail.com; rdapp....@gmail.com; Liu, Hongtao > > <hongtao....@intel.com> > > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for > > vectorizable_call > > > > On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote: > > > So would this not be the simplest fix: > > > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > > index 87b3dc413b8..fcbc83a49f0 100644 > > > --- a/gcc/tree-vect-patterns.cc > > > +++ b/gcc/tree-vect-patterns.cc > > > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, > > stmt_vec_info stmt_vinfo, > > > > > > if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)) > > > > But then you call gimple_unsigned_integer_sat_add with mismatching types, > > not sure if that is ok. > > > > gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the > expression > rooted in lhs and returns the results in ops. So not sure what you mean here. > > > > { > > > + if (TREE_CODE (ops[1]) == INTEGER_CST) > > > + ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]); > > > + > > > > This would be only ok if the conversion doesn't change the value > > of the constant. > > .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't > > want to change the latter to the former if the value has the most > > significant bit set. > > Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128 > > arguments, you don't want to truncate the constant. > > Yes, if the expression truncates or changes the sign of the expression this > wouldn't > work. But then you can also not match at all. So the match should be > rejected > then > since the values need to fit in the same type as the argument and be the same > sign. > > So the original vect_recog_promote_cst_to_unsigned is also wrong since it > doesn't > stop the match if it doesn't fit. > > Imho, the pattern here cannot check this and it should be part of the match > condition. > If the constant cannot fir into the same type as the operand or has a > different sign > the matching should fail. > > It's just that in match.pd you can't modify the arguments returned from a > predicate > but since the predicate is intended to be used to rewrite to the IFN I still > think the > above solution is right, and the range check should be done within the > predicate. > > Tamar. > > > So, you could e.g. do the fold_convert and then verify if > > wi::to_widest on the old and new tree are equal, or you could check for > > TREE_OVERFLOW if fold_convert honors that. > > As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite > > precision value (aka wi::to_widest) is all that matters. > > > > Jakub