On Thu, Jun 27, 2024 at 9:01 AM Li, Pan2 <pan2...@intel.com> wrote:
>
> It only requires the backend implement the standard name for vector mode I 
> bet.

There are several standard names present for x86:
{ss,us}{add,sub}{v8qi,v16qi,v32qi,v64qi,v4hi,v8hi,v16hi,v32hi},
defined in sse.md:

(define_expand "<insn><mode>3<mask_name>"
  [(set (match_operand:VI12_AVX2_AVX512BW 0 "register_operand")
    (sat_plusminus:VI12_AVX2_AVX512BW
      (match_operand:VI12_AVX2_AVX512BW 1 "vector_operand")
      (match_operand:VI12_AVX2_AVX512BW 2 "vector_operand")))]
  "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
  "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")

but all of these handle only 8 and 16 bit elements.

> How about a simpler one like below.
>
>   #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)                   \
>   void __attribute__((noinline))                                       \
>   vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
>        unsigned limit)                 \
>   {                                                                    \
>     unsigned i;                                                        \
>     for (i = 0; i < limit; i++)                                        \
>       {                                                                \
>         IN_T x = op_1[i];                                              \
>         out[i] = (OUT_T)(x >= y ? x - y : 0);                          \
>       }                                                                \
>   }
>
> DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t);

I tried with:

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t);

And the compiler was able to detect several .SAT_SUB patterns:

$ grep SAT_SUB pr51492-1.c.266t.optimized
 vect_patt_37.14_85 = .SAT_SUB (vect_x_13.12_81, vect_cst__84);
 vect_patt_37.14_86 = .SAT_SUB (vect_x_13.13_83, vect_cst__84);
 vect_patt_42.26_126 = .SAT_SUB (vect_x_62.24_122, vect_cst__125);
 vect_patt_42.26_127 = .SAT_SUB (vect_x_62.25_124, vect_cst__125);
 iftmp.0_24 = .SAT_SUB (x_3, y_14(D));

Uros.

>
> The riscv backend is able to detect the pattern similar as below. I can help 
> to check x86 side after the running test suites.
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   if (limit_11(D) != 0)
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 5>; [11.00%]
> ;;    succ:       3
> ;;                5
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
>   vect_cst__71 = [vec_duplicate_expr] y_14(D);
>   _78 = (unsigned long) limit_11(D);
> ;;    succ:       4
>
> ;;   basic block 4, loop depth 1
> ;;    pred:       4
> ;;                3
>   # vectp_op_1.7_68 = PHI <vectp_op_1.7_69(4), op_1_12(D)(3)>
>   # vectp_out.12_75 = PHI <vectp_out.12_76(4), out_16(D)(3)>
>   # ivtmp_79 = PHI <ivtmp_80(4), _78(3)>
>   _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [2, 2]);
>   ivtmp_67 = _81 * 8;
>   vect_x_13.9_70 = .MASK_LEN_LOAD (vectp_op_1.7_68, 64B, { -1, ... }, _81, 0);
>   vect_patt_48.10_72 = .SAT_SUB (vect_x_13.9_70, vect_cst__71);               
>                // .SAT_SUB pattern
>   vect_patt_49.11_73 = (vector([2,2]) unsigned int) vect_patt_48.10_72;
>   ivtmp_74 = _81 * 4;
>   .MASK_LEN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0, 
> vect_patt_49.11_73);
>   vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67;
>   vectp_out.12_76 = vectp_out.12_75 + ivtmp_74;
>   ivtmp_80 = ivtmp_79 - _81;
>
> riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental)
> Copyright (C) 2024 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Pan
>
> -----Original Message-----
> From: Uros Bizjak <ubiz...@gmail.com>
> Sent: Thursday, June 27, 2024 2:48 PM
> To: Li, Pan2 <pan2...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> richard.guent...@gmail.com; jeffreya...@gmail.com; pins...@gmail.com
> Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
>
> On Mon, Jun 24, 2024 at 3:55 PM <pan2...@intel.com> wrote:
> >
> > From: Pan Li <pan2...@intel.com>
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> >     a = *--p;
> >     *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> >   } while (--n);
> > }
> >

No, the current compiler does not recognize .SAT_SUB for x86 with the
above code, although many vector sat sub instructions involving 16bit
elements are present.

Uros.

Reply via email to