juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai>
>
> Hi, Richi and Richard.
>
> Base one the review comments from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
>
> I change len_mask_gather_load/len_mask_scatter_store order into:
> {len,bias,mask}
>
> We adjust adding len and mask using using add_len_and_mask_args
> which is same as partial_load/parial_store.
>
> Now, the codes become more reasonable and easier maintain.
>
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter 
> memory
> operations. Consider this following case:
>
> #include <stdint.h>
> void
> f (uint8_t *restrict a,
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
>
> We hope RVV can vectorize such case into following IR:
>
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
>
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
>
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_len_index): Ditto.
>         (internal_fn_mask_index): Ditto.
>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.

Nice!  OK, thanks.

Richard

> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 +++++++++++++++++---------------
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 42 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5e5482265cd..f14dd32b2dc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
> @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
> (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  Similar 
> to len_maskload,
> +the instruction loads at most (operand 5 + operand 6) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
> extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
> (operand 5),
> +a bias operand (operand 6) as well as a mask operand (operand 7).  The 
> instruction stores
> +at most (operand 5 + operand 6) elements of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c1fcb38b17b..303df102d81 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3507,7 +3507,6 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
> *stmt, direct_optab optab)
>  {
>    internal_fn ifn = gimple_call_internal_fn (stmt);
>    int rhs_index = internal_fn_stored_value_index (ifn);
> -  int mask_index = internal_fn_mask_index (ifn);
>    tree base = gimple_call_arg (stmt, 0);
>    tree offset = gimple_call_arg (stmt, 1);
>    tree scale = gimple_call_arg (stmt, 2);
> @@ -3518,19 +3517,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
> *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE 
> (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
> -  if (mask_index >= 0)
> -    {
> -      tree mask = gimple_call_arg (stmt, mask_index);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE 
> (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>  
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE 
> (rhs)),
>                                          TYPE_MODE (TREE_TYPE (offset)));
> @@ -3553,18 +3547,13 @@ expand_gather_load_optab_fn (internal_fn, gcall 
> *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE 
> (offset)));
>    create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
>    create_integer_operand (&ops[i++], scale_int);
> -  if (optab == mask_gather_load_optab)
> -    {
> -      tree mask = gimple_call_arg (stmt, 4);
> -      rtx mask_rtx = expand_normal (mask);
> -      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE 
> (mask)));
> -    }
> +  i = add_len_and_mask_args (ops, i, stmt);
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE 
> (lhs)),
>                                          TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4415,6 +4404,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4436,6 +4426,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4454,8 +4445,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4477,6 +4470,10 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 2;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 4;
> +
>      default:
>        return -1;
>      }
> @@ -4502,6 +4499,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 4;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 6;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>             || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4520,6 +4521,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return 3;
>  
>      case IFN_LEN_STORE:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index d9fcca8430f..9b73e540d55 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>                      mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +                    len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, 
> len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>                      mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +                    len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index a901b68c538..73c9a0c760f 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")

Reply via email to