RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

Li, Pan2 Thu, 17 Oct 2024 02:46:53 -0700

Thanks Richard for comments.

> Enabling it via match.pd looks possible but also possibly sub-optimal
> for costing side on the
> vectorizer - supporting it directly in the vectorizer can be done later 
> though.


Sure, will have a try in v2.

Pan

-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com> 
Sent: Thursday, October 17, 2024 3:13 PM
To: Li, Pan2 <pan2...@intel.com>
Cc: Richard Sandiford <richard.sandif...@arm.com>; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

On Thu, Oct 17, 2024 at 8:38 AM Li, Pan2 <pan2...@intel.com> wrote:
>
> It is quit a while since last discussion.
> I recall these materials recently and have a try in the risc-v backend.
>
>    1   │ void foo (int * __restrict a, int * __restrict b, int stride, int n)
>    2   │ {
>    3   │     for (int i = 0; i < n; i++)
>    4   │       a[i*stride] = b[i*stride] + 100;
>    5   │ }
>
> We will have expand similar as below for VEC_SERIES_EXPR + 
> MASK_LEN_GATHER_LOAD.
> There will be 8 insns after expand which is not applicable when try_combine 
> (at most 4 insn) if
> my understand is correct.
>
> Thus, is there any other approaches instead of adding new IFN? If we need to 
> add new IFN, can
> we leverage match.pd to try to match the MASK_LEN_GATHER_LOAD(base, 
> VEC_SERICES_EXPR, ...)
> pattern and then emit the new IFN like sat alu does.

Adding an optab (and direct internal fn) is fine I guess - it should
be modeled after the
gather optab specifying the vec_series is implicit with the then scalar stride.

Enabling it via match.pd looks possible but also possibly sub-optimal
for costing side on the
vectorizer - supporting it directly in the vectorizer can be done later though.

Richard.

> Thanks a lot.
>
>  316   │ ;; _58 = VEC_SERIES_EXPR <0, _57>;
>  317   │
>  318   │ (insn 17 16 18 (set (reg:DI 156 [ _56 ])
>  319   │         (ashiftrt:DI (reg:DI 141 [ _54 ])
>  320   │             (const_int 2 [0x2]))) -1
>  321   │      (expr_list:REG_EQUAL (div:DI (reg:DI 141 [ _54 ])
>  322   │             (const_int 4 [0x4]))
>  323   │         (nil)))
>  324   │
>  325   │ (insn 18 17 19 (set (reg:DI 158)
>  326   │         (unspec:DI [
>  327   │                 (const_int 32 [0x20])
>  328   │             ] UNSPEC_VLMAX)) -1
>  329   │      (nil))
>  330   │
>  331   │ (insn 19 18 20 (set (reg:RVVM1SI 157)
>  332   │         (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  333   │                     (const_vector:RVVMF32BI repeat [
>  334   │                             (const_int 1 [0x1])
>  335   │                         ])
>  336   │                     (reg:DI 158)
>  337   │                     (const_int 2 [0x2]) repeated x2
>  338   │                     (const_int 1 [0x1])
>  339   │                     (reg:SI 66 vl)
>  340   │                     (reg:SI 67 vtype)
>  341   │                 ] UNSPEC_VPREDICATE)
>  342   │             (vec_series:RVVM1SI (const_int 0 [0])
>  343   │                 (const_int 1 [0x1]))
>  344   │             (unspec:RVVM1SI [
>  345   │                     (reg:DI 0 zero)
>  346   │                 ] UNSPEC_VUNDEF))) -1
>  347   │      (nil))
>  348   │
>  349   │ (insn 20 19 21 (set (reg:DI 160)
>  350   │         (unspec:DI [
>  351   │                 (const_int 32 [0x20])
>  352   │             ] UNSPEC_VLMAX)) -1
>  353   │      (nil))
>  354   │
>  355   │ (insn 21 20 22 (set (reg:RVVM1SI 159)
>  356   │         (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  357   │                     (const_vector:RVVMF32BI repeat [
>  358   │                             (const_int 1 [0x1])
>  359   │                         ])
>  360   │                     (reg:DI 160)
>  361   │                     (const_int 2 [0x2]) repeated x2
>  362   │                     (const_int 1 [0x1])
>  363   │                     (reg:SI 66 vl)
>  364   │                     (reg:SI 67 vtype)
>  365   │                 ] UNSPEC_VPREDICATE)
>  366   │             (mult:RVVM1SI (vec_duplicate:RVVM1SI (subreg:SI (reg:DI 
> 156 [ _56 ]) 0))
>  367   │                 (reg:RVVM1SI 157))
>  368   │             (unspec:RVVM1SI [
>  369   │                     (reg:DI 0 zero)
>  370   │                 ] UNSPEC_VUNDEF))) -1
>  371   │      (nil))
>  ...
>  403   │ ;; vect__5.16_61 = .MASK_LEN_GATHER_LOAD (vectp_b.14_59, _58, 4, { 
> 0, ... }, { -1, ... }, _73, 0);
>  404   │
>  405   │ (insn 27 26 28 (set (reg:RVVM2DI 161)
>  406   │         (sign_extend:RVVM2DI (reg:RVVM1SI 145 [ _58 ]))) 
> "strided_ld-st.c":4:22 -1
>  407   │      (nil))
>  408   │
>  409   │ (insn 28 27 29 (set (reg:RVVM2DI 162)
>  410   │         (ashift:RVVM2DI (reg:RVVM2DI 161)
>  411   │             (const_int 2 [0x2]))) "strided_ld-st.c":4:22 -1
>  412   │      (nil))
>  413   │
>  414   │ (insn 29 28 0 (set (reg:RVVM1SI 146 [ vect__5.16 ])
>  415   │         (if_then_else:RVVM1SI (unspec:RVVMF32BI [
>  416   │                     (const_vector:RVVMF32BI repeat [
>  417   │                             (const_int 1 [0x1])
>  418   │                         ])
>  419   │                     (reg:DI 149 [ _73 ])
>  420   │                     (const_int 2 [0x2]) repeated x2
>  421   │                     (const_int 0 [0])
>  422   │                     (reg:SI 66 vl)
>  423   │                     (reg:SI 67 vtype)
>  424   │                 ] UNSPEC_VPREDICATE)
>  425   │             (unspec:RVVM1SI [
>  426   │                     (reg/v/f:DI 151 [ b ])
>  427   │                     (mem:BLK (scratch) [0  A8])
>  428   │                     (reg:RVVM2DI 162)
>  429   │                 ] UNSPEC_UNORDERED)
>  430   │             (unspec:RVVM1SI [
>  431   │                     (reg:DI 0 zero)
>  432   │                 ] UNSPEC_VUNDEF))) "strided_ld-st.c":4:22 -1
>  433   │      (nil))
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2 <pan2...@intel.com>
> Sent: Wednesday, June 5, 2024 3:50 PM
> To: Richard Biener <richard.guent...@gmail.com>; Richard Sandiford 
> <richard.sandif...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
>
> Looks not easy to get the original context/history, only catch some shadow 
> from below patch but not the fully picture.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634683.html
>
> It is reasonable to me that using gather/scatter with a VEC_SERICES, for 
> example as blow, will have a try for this.
>
> operand_0 = mask_gather_loadmn (ptr, offset, 1/0(sign/unsign), multiply, mask)
>   offset = (vec_series:m base step) => base + i * step
>   op_0[i] = memory[ptr + offset[i] * multiply] && mask[i]
>
> operand_0 = mask_len_strided_load (ptr, stride, mask, len, bias).
>   op_0[i] = memory[prt + stride * i] && mask[i] && i < (len + bias)
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, June 5, 2024 9:18 AM
> To: Richard Biener <richard.guent...@gmail.com>; Richard Sandiford 
> <richard.sandif...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
>
> > Sorry if we have discussed this last year already - is there anything wrong
> > with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?
>
> Thanks for comments, it is quit a while since last discussion. Let me recall 
> a little about it and keep you posted.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Tuesday, June 4, 2024 9:22 PM
> To: Li, Pan2 <pan2...@intel.com>; Richard Sandiford 
> <richard.sandif...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store
>
> On Tue, May 28, 2024 at 5:15 AM <pan2...@intel.com> wrote:
> >
> > From: Pan Li <pan2...@intel.com>
> >
> > This patch would like to add new internal fun for the below 2 IFN.
> > * mask_len_strided_load
> > * mask_len_strided_store
> >
> > The GIMPLE v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) will
> > be expanded into v = mask_len_strided_load (ptr, stried, mask, len, bias).
> >
> > The GIMPLE MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
> > be expanded into mask_len_stried_store (ptr, stride, v, mask, len, bias).
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
>
> Sorry if we have discussed this last year already - is there anything wrong
> with using a gather/scatter with a VEC_SERIES gimple/rtl def for the offset?
>
> Richard.
>
> > gcc/ChangeLog:
> >
> >         * doc/md.texi: Add description for mask_len_strided_load/store.
> >         * internal-fn.cc (strided_load_direct): New internal_fn define
> >         for strided_load_direct.
> >         (strided_store_direct): Ditto but for store.
> >         (expand_strided_load_optab_fn): New expand func for
> >         mask_len_strided_load.
> >         (expand_strided_store_optab_fn): Ditto but for store.
> >         (direct_strided_load_optab_supported_p): New define for load
> >         direct optab supported.
> >         (direct_strided_store_optab_supported_p): Ditto but for store.
> >         (internal_fn_len_index): Add len index for both load and store.
> >         (internal_fn_mask_index): Ditto but for mask index.
> >         (internal_fn_stored_value_index): Add stored index.
> >         * internal-fn.def (MASK_LEN_STRIDED_LOAD): New direct fn define
> >         for strided_load.
> >         (MASK_LEN_STRIDED_STORE): Ditto but for stride_store.
> >         * optabs.def (OPTAB_D): New optab define for load and store.
> >
> > Signed-off-by: Pan Li <pan2...@intel.com>
> > Co-Authored-By: Juzhe-Zhong <juzhe.zh...@rivai.ai>
> > ---
> >  gcc/doc/md.texi     | 27 ++++++++++++++++
> >  gcc/internal-fn.cc  | 75 +++++++++++++++++++++++++++++++++++++++++++++
> >  gcc/internal-fn.def |  6 ++++
> >  gcc/optabs.def      |  2 ++
> >  4 files changed, 110 insertions(+)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 5730bda80dc..3d242675c63 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5138,6 +5138,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >
> > +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > Pmode.
> > +operand 3 is mask operand, operand 4 is length operand and operand 5 is 
> > bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i load address is operand 1 + @var{i} * operand 2.
> > +Similar to mask_len_load, the instruction loads at most (operand 4 + 
> > operand 5) elements from memory.
> > +Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
> > result should
> > +be loaded from memory and clear if element @var{i} of the result should be 
> > zero.
> > +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> > +
> >  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
> >  @item @samp{scatter_store@var{m}@var{n}}
> >  Store a vector of mode @var{m} into several distinct memory locations.
> > @@ -5175,6 +5189,19 @@ at most (operand 6 + operand 7) elements of (operand 
> > 4) to memory.
> >  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> > stored.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >
> > +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_store@var{m}}
> > +Store a vector of mode m into several distinct memory locations.
> > +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> > +Operand 2 is the vector of values that should be stored, which is of mode 
> > @var{m}.
> > +operand 3 is mask operand, operand 4 is length operand and operand 5 is 
> > bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_scatter_store@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 1 as step.
> > +For each element index i store address is operand 0 + @var{i} * operand 1.
> > +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> > operand 5) elements of mask (operand 3) to memory.
> > +Element @var{i} of the mask is set if element @var{i} of (operand 3) 
> > should be stored.
> > +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> > +
> >  @cindex @code{vec_set@var{m}} instruction pattern
> >  @item @samp{vec_set@var{m}}
> >  Set given field in the vector value.  Operand 0 is the vector to modify,
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 9c09026793f..f6e5329cd84 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -159,6 +159,7 @@ init_internal_fns ()
> >  #define load_lanes_direct { -1, -1, false }
> >  #define mask_load_lanes_direct { -1, -1, false }
> >  #define gather_load_direct { 3, 1, false }
> > +#define strided_load_direct { -1, -1, false }
> >  #define len_load_direct { -1, -1, false }
> >  #define mask_len_load_direct { -1, 4, false }
> >  #define mask_store_direct { 3, 2, false }
> > @@ -168,6 +169,7 @@ init_internal_fns ()
> >  #define vec_cond_mask_len_direct { 1, 1, false }
> >  #define vec_cond_direct { 2, 0, false }
> >  #define scatter_store_direct { 3, 1, false }
> > +#define strided_store_direct { 1, 1, false }
> >  #define len_store_direct { 3, 3, false }
> >  #define mask_len_store_direct { 4, 5, false }
> >  #define vec_set_direct { 3, 3, false }
> > @@ -3668,6 +3670,68 @@ expand_gather_load_optab_fn (internal_fn, gcall 
> > *stmt, direct_optab optab)
> >      emit_move_insn (lhs_rtx, ops[0].value);
> >  }
> >
> > +/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
> > +
> > +static void
> > +expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> > +                             direct_optab optab)
> > +{
> > +  tree lhs = gimple_call_lhs (stmt);
> > +  tree base = gimple_call_arg (stmt, 0);
> > +  tree stride = gimple_call_arg (stmt, 1);
> > +
> > +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> > +  rtx base_rtx = expand_normal (base);
> > +  rtx stride_rtx = expand_normal (stride);
> > +
> > +  unsigned i = 0;
> > +  class expand_operand ops[6];
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> > +
> > +  create_output_operand (&ops[i++], lhs_rtx, mode);
> > +  create_address_operand (&ops[i++], base_rtx);
> > +  create_address_operand (&ops[i++], stride_rtx);
> > +
> > +  insn_code icode = direct_optab_handler (optab, mode);
> > +
> > +  i = add_mask_and_len_args (ops, i, stmt);
> > +  expand_insn (icode, i, ops);
> > +
> > +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
> > +    emit_move_insn (lhs_rtx, ops[0].value);
> > +}
> > +
> > +/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
> > +
> > +static void
> > +expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
> > +                              direct_optab optab)
> > +{
> > +  internal_fn fn = gimple_call_internal_fn (stmt);
> > +  int rhs_index = internal_fn_stored_value_index (fn);
> > +
> > +  tree base = gimple_call_arg (stmt, 0);
> > +  tree stride = gimple_call_arg (stmt, 1);
> > +  tree rhs = gimple_call_arg (stmt, rhs_index);
> > +
> > +  rtx base_rtx = expand_normal (base);
> > +  rtx stride_rtx = expand_normal (stride);
> > +  rtx rhs_rtx = expand_normal (rhs);
> > +
> > +  unsigned i = 0;
> > +  class expand_operand ops[6];
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
> > +
> > +  create_address_operand (&ops[i++], base_rtx);
> > +  create_address_operand (&ops[i++], stride_rtx);
> > +  create_input_operand (&ops[i++], rhs_rtx, mode);
> > +
> > +  insn_code icode = direct_optab_handler (optab, mode);
> > +  i = add_mask_and_len_args (ops, i, stmt);
> > +
> > +  expand_insn (icode, i, ops);
> > +}
> > +
> >  /* Helper for expand_DIVMOD.  Return true if the sequence starting with
> >     INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
> >
> > @@ -4058,6 +4122,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> > tree_pair types,
> >  #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
> >  #define direct_mask_load_lanes_optab_supported_p 
> > multi_vector_optab_supported_p
> >  #define direct_gather_load_optab_supported_p convert_optab_supported_p
> > +#define direct_strided_load_optab_supported_p direct_optab_supported_p
> >  #define direct_len_load_optab_supported_p direct_optab_supported_p
> >  #define direct_mask_len_load_optab_supported_p convert_optab_supported_p
> >  #define direct_mask_store_optab_supported_p convert_optab_supported_p
> > @@ -4066,6 +4131,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> > tree_pair types,
> >  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> >  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
> >  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
> > +#define direct_strided_store_optab_supported_p direct_optab_supported_p
> >  #define direct_len_store_optab_supported_p direct_optab_supported_p
> >  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
> >  #define direct_while_optab_supported_p convert_optab_supported_p
> > @@ -4723,6 +4789,8 @@ internal_fn_len_index (internal_fn fn)
> >      case IFN_COND_LEN_XOR:
> >      case IFN_COND_LEN_SHL:
> >      case IFN_COND_LEN_SHR:
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> >        return 4;
> >
> >      case IFN_COND_LEN_NEG:
> > @@ -4817,6 +4885,10 @@ internal_fn_mask_index (internal_fn fn)
> >      case IFN_MASK_LEN_STORE:
> >        return 2;
> >
> > +    case IFN_MASK_LEN_STRIDED_LOAD:
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> > +      return 3;
> > +
> >      case IFN_MASK_GATHER_LOAD:
> >      case IFN_MASK_SCATTER_STORE:
> >      case IFN_MASK_LEN_GATHER_LOAD:
> > @@ -4840,6 +4912,9 @@ internal_fn_stored_value_index (internal_fn fn)
> >  {
> >    switch (fn)
> >      {
> > +    case IFN_MASK_LEN_STRIDED_STORE:
> > +      return 2;
> > +
> >      case IFN_MASK_STORE:
> >      case IFN_MASK_STORE_LANES:
> >      case IFN_SCATTER_STORE:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 25badbb86e5..b30a7a5b009 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> >     - mask_load_lanes: currently just vec_mask_load_lanes
> >     - mask_len_load_lanes: currently just vec_mask_len_load_lanes
> >     - gather_load: used for {mask_,mask_len_,}gather_load
> > +   - strided_load: currently just mask_len_strided_load
> >     - len_load: currently just len_load
> >     - mask_len_load: currently just mask_len_load
> >
> > @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
> >     - mask_store_lanes: currently just vec_mask_store_lanes
> >     - mask_len_store_lanes: currently just vec_mask_len_store_lanes
> >     - scatter_store: used for {mask_,mask_len_,}scatter_store
> > +   - strided_store: currently just mask_len_strided_store
> >     - len_store: currently just len_store
> >     - mask_len_store: currently just mask_len_store
> >
> > @@ -212,6 +214,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
> >                        mask_gather_load, gather_load)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
> >                        mask_len_gather_load, gather_load)
> > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> > +                      mask_len_strided_load, strided_load)
> >
> >  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, 
> > mask_len_load)
> > @@ -221,6 +225,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
> >                        mask_scatter_store, scatter_store)
> >  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
> >                        mask_len_scatter_store, scatter_store)
> > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> > +                      mask_len_strided_store, strided_store)
> >
> >  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
> >  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, 
> > store_lanes)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 3f2cb46aff8..630b1de8f97 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -539,4 +539,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> >  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
> >  OPTAB_D (len_load_optab, "len_load_$a")
> >  OPTAB_D (len_store_optab, "len_store_$a")
> > +OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
> > +OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
> >  OPTAB_D (select_vl_optab, "select_vl$a")
> > --
> > 2.34.1
> >

RE: [PATCH v1] Internal-fn: Add new IFN mask_len_strided_load/store

Reply via email to