On Mon, Jun 17, 2024 at 3:41 AM <pan2...@intel.com> wrote:
>
> From: Pan Li <pan2...@intel.com>
>
> When investigate the vectorization of .SAT_ADD,  we notice there
> are additional 2 forms,  aka form 7 and 8 for .SAT_ADD.
>
> Form 7:
>   #define DEF_SAT_U_ADD_FMT_7(T)          \
>   T __attribute__((noinline))             \
>   sat_u_add_##T##_fmt_7 (T x, T y)        \
>   {                                       \
>     return x > (T)(x + y) ? -1 : (x + y); \
>   }
>
> Form 8:
>   #define DEF_SAT_U_ADD_FMT_8(T)           \
>   T __attribute__((noinline))              \
>   sat_u_add_##T##_fmt_8 (T x, T y)         \
>   {                                        \
>     return x <= (T)(x + y) ? (x + y) : -1; \
>   }
>
> Thus,  add above 2 forms to the match gimple_unsigned_integer_sat_add,
> and then the vectorizer can try to recog the pattern like form 7 and
> form 8.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK.

Note that fold-const.cc has canonicalization for the minus one to be put last:

      /* If the second operand is simpler than the third, swap them
         since that produces better jump optimization results.  */
      if (truth_value_p (TREE_CODE (arg0))
          && tree_swap_operands_p (op1, op2))
        {
          location_t loc0 = expr_location_or (arg0, loc);
          /* See if this can be inverted.  If it can't, possibly because
             it was a floating-point inequality comparison, don't do
             anything.  */
          tem = fold_invert_truthvalue (loc0, arg0);
          if (tem)
            return fold_build3_loc (loc, code, type, tem, op2, op1);

we might want to consider such transform in match.pd, in this case this
would allow to elide one of the patterns.

Richard.

> gcc/ChangeLog:
>
>         * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match.
>
> Signed-off-by: Pan Li <pan2...@intel.com>
> ---
>  gcc/match.pd | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..aae6d30a5e4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
>    integer_minus_onep (usadd_left_part_2 @0 @1)))
>
> +/* Unsigned saturation add, case 7 (branch with le):
> +   SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
> +
> +/* Unsigned saturation add, case 8 (branch with gt):
> +   SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2))
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
>     SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> --
> 2.34.1
>

Reply via email to