On Thu, Jun 27, 2024 at 9:01 AM Li, Pan2 <pan2...@intel.com> wrote: > > It only requires the backend implement the standard name for vector mode I > bet.
There are several standard names present for x86: {ss,us}{add,sub}{v8qi,v16qi,v32qi,v64qi,v4hi,v8hi,v16hi,v32hi}, defined in sse.md: (define_expand "<insn><mode>3<mask_name>" [(set (match_operand:VI12_AVX2_AVX512BW 0 "register_operand") (sat_plusminus:VI12_AVX2_AVX512BW (match_operand:VI12_AVX2_AVX512BW 1 "vector_operand") (match_operand:VI12_AVX2_AVX512BW 2 "vector_operand")))] "TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>" "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);") but all of these handle only 8 and 16 bit elements. > How about a simpler one like below. > > #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T) \ > void __attribute__((noinline)) \ > vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \ > unsigned limit) \ > { \ > unsigned i; \ > for (i = 0; i < limit; i++) \ > { \ > IN_T x = op_1[i]; \ > out[i] = (OUT_T)(x >= y ? x - y : 0); \ > } \ > } > > DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t); I tried with: DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t); And the compiler was able to detect several .SAT_SUB patterns: $ grep SAT_SUB pr51492-1.c.266t.optimized vect_patt_37.14_85 = .SAT_SUB (vect_x_13.12_81, vect_cst__84); vect_patt_37.14_86 = .SAT_SUB (vect_x_13.13_83, vect_cst__84); vect_patt_42.26_126 = .SAT_SUB (vect_x_62.24_122, vect_cst__125); vect_patt_42.26_127 = .SAT_SUB (vect_x_62.25_124, vect_cst__125); iftmp.0_24 = .SAT_SUB (x_3, y_14(D)); Uros. > > The riscv backend is able to detect the pattern similar as below. I can help > to check x86 side after the running test suites. > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > if (limit_11(D) != 0) > goto <bb 3>; [89.00%] > else > goto <bb 5>; [11.00%] > ;; succ: 3 > ;; 5 > ;; basic block 3, loop depth 0 > ;; pred: 2 > vect_cst__71 = [vec_duplicate_expr] y_14(D); > _78 = (unsigned long) limit_11(D); > ;; succ: 4 > > ;; basic block 4, loop depth 1 > ;; pred: 4 > ;; 3 > # vectp_op_1.7_68 = PHI <vectp_op_1.7_69(4), op_1_12(D)(3)> > # vectp_out.12_75 = PHI <vectp_out.12_76(4), out_16(D)(3)> > # ivtmp_79 = PHI <ivtmp_80(4), _78(3)> > _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [2, 2]); > ivtmp_67 = _81 * 8; > vect_x_13.9_70 = .MASK_LEN_LOAD (vectp_op_1.7_68, 64B, { -1, ... }, _81, 0); > vect_patt_48.10_72 = .SAT_SUB (vect_x_13.9_70, vect_cst__71); > // .SAT_SUB pattern > vect_patt_49.11_73 = (vector([2,2]) unsigned int) vect_patt_48.10_72; > ivtmp_74 = _81 * 4; > .MASK_LEN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0, > vect_patt_49.11_73); > vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67; > vectp_out.12_76 = vectp_out.12_75 + ivtmp_74; > ivtmp_80 = ivtmp_79 - _81; > > riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental) > Copyright (C) 2024 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > Pan > > -----Original Message----- > From: Uros Bizjak <ubiz...@gmail.com> > Sent: Thursday, June 27, 2024 2:48 PM > To: Li, Pan2 <pan2...@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > richard.guent...@gmail.com; jeffreya...@gmail.com; pins...@gmail.com > Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip > > On Mon, Jun 24, 2024 at 3:55 PM <pan2...@intel.com> wrote: > > > > From: Pan Li <pan2...@intel.com> > > > > The zip benchmark of coremark-pro have one SAT_SUB like pattern but > > truncated as below: > > > > void test (uint16_t *x, unsigned b, unsigned n) > > { > > unsigned a = 0; > > register uint16_t *p = x; > > > > do { > > a = *--p; > > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > > } while (--n); > > } > > No, the current compiler does not recognize .SAT_SUB for x86 with the above code, although many vector sat sub instructions involving 16bit elements are present. Uros.