juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai> > > Hi, Richi and Richard. > > Base one the review comments from Richard: > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html > > I change len_mask_gather_load/len_mask_scatter_store order into: > {len,bias,mask} > > We adjust adding len and mask using using add_len_and_mask_args > which is same as partial_load/parial_store. > > Now, the codes become more reasonable and easier maintain. > > This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets > handle flow control by mask and loop control by length on gather/scatter > memory > operations. Consider this following case: > > #include <stdint.h> > void > f (uint8_t *restrict a, > uint8_t *restrict b, int n, > int base, int step, > int *restrict cond) > { > for (int i = 0; i < n; ++i) > { > if (cond[i]) > a[i * step + base] = b[i * step + base]; > } > } > > We hope RVV can vectorize such case into following IR: > > loop_len = SELECT_VL > control_mask = comparison > v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask) > LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask) > > This patch doesn't apply such patterns into vectorizer, just add patterns > and update the documents. > > Will send patch which apply such patterns into vectorizer soon after this > patch is approved. > > Ok for trunk? > > gcc/ChangeLog: > > * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store. > * internal-fn.cc (expand_scatter_store_optab_fn): Ditto. > (expand_gather_load_optab_fn): Ditto. > (internal_load_fn_p): Ditto. > (internal_store_fn_p): Ditto. > (internal_gather_scatter_fn_p): Ditto. > (internal_fn_len_index): Ditto. > (internal_fn_mask_index): Ditto. > (internal_fn_stored_value_index): Ditto. > * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. > (LEN_MASK_SCATTER_STORE): Ditto. > * optabs.def (OPTAB_CD): Ditto.
Nice! OK, thanks. Richard > --- > gcc/doc/md.texi | 17 +++++++++++++++++ > gcc/internal-fn.cc | 32 +++++++++++++++++--------------- > gcc/internal-fn.def | 8 ++++++-- > gcc/optabs.def | 2 ++ > 4 files changed, 42 insertions(+), 17 deletions(-) > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 5e5482265cd..f14dd32b2dc 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element > @var{i} > of the result should be loaded from memory and clear if element @var{i} > of the result should be set to zero. > > +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern > +@item @samp{len_mask_gather_load@var{m}@var{n}} > +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand > (operand 5), > +a bias operand (operand 6) as well as a mask operand (operand 7). Similar > to len_maskload, > +the instruction loads at most (operand 5 + operand 6) elements from memory. > +Bit @var{i} of the mask is set if element @var{i} of the result should > +be loaded from memory and clear if element @var{i} of the result should be > undefined. > +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored. > + > @cindex @code{scatter_store@var{m}@var{n}} instruction pattern > @item @samp{scatter_store@var{m}@var{n}} > Store a vector of mode @var{m} into several distinct memory locations. > @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an > extra mask operand as > operand 5. Bit @var{i} of the mask is set if element @var{i} > of the result should be stored to memory. > > +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern > +@item @samp{len_mask_scatter_store@var{m}@var{n}} > +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand > (operand 5), > +a bias operand (operand 6) as well as a mask operand (operand 7). The > instruction stores > +at most (operand 5 + operand 6) elements of (operand 4) to memory. > +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be > stored. > +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored. > + > @cindex @code{vec_set@var{m}} instruction pattern > @item @samp{vec_set@var{m}} > Set given field in the vector value. Operand 0 is the vector to modify, > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index c1fcb38b17b..303df102d81 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -3507,7 +3507,6 @@ expand_scatter_store_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) > { > internal_fn ifn = gimple_call_internal_fn (stmt); > int rhs_index = internal_fn_stored_value_index (ifn); > - int mask_index = internal_fn_mask_index (ifn); > tree base = gimple_call_arg (stmt, 0); > tree offset = gimple_call_arg (stmt, 1); > tree scale = gimple_call_arg (stmt, 2); > @@ -3518,19 +3517,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) > HOST_WIDE_INT scale_int = tree_to_shwi (scale); > rtx rhs_rtx = expand_normal (rhs); > > - class expand_operand ops[6]; > + class expand_operand ops[8]; > int i = 0; > create_address_operand (&ops[i++], base_rtx); > create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE > (offset))); > create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); > create_integer_operand (&ops[i++], scale_int); > create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); > - if (mask_index >= 0) > - { > - tree mask = gimple_call_arg (stmt, mask_index); > - rtx mask_rtx = expand_normal (mask); > - create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE > (mask))); > - } > + i = add_len_and_mask_args (ops, i, stmt); > > insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (rhs)), > TYPE_MODE (TREE_TYPE (offset))); > @@ -3553,18 +3547,13 @@ expand_gather_load_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) > HOST_WIDE_INT scale_int = tree_to_shwi (scale); > > int i = 0; > - class expand_operand ops[6]; > + class expand_operand ops[8]; > create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs))); > create_address_operand (&ops[i++], base_rtx); > create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE > (offset))); > create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); > create_integer_operand (&ops[i++], scale_int); > - if (optab == mask_gather_load_optab) > - { > - tree mask = gimple_call_arg (stmt, 4); > - rtx mask_rtx = expand_normal (mask); > - create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE > (mask))); > - } > + i = add_len_and_mask_args (ops, i, stmt); > insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (lhs)), > TYPE_MODE (TREE_TYPE (offset))); > expand_insn (icode, i, ops); > @@ -4415,6 +4404,7 @@ internal_load_fn_p (internal_fn fn) > case IFN_MASK_LOAD_LANES: > case IFN_GATHER_LOAD: > case IFN_MASK_GATHER_LOAD: > + case IFN_LEN_MASK_GATHER_LOAD: > case IFN_LEN_LOAD: > case IFN_LEN_MASK_LOAD: > return true; > @@ -4436,6 +4426,7 @@ internal_store_fn_p (internal_fn fn) > case IFN_MASK_STORE_LANES: > case IFN_SCATTER_STORE: > case IFN_MASK_SCATTER_STORE: > + case IFN_LEN_MASK_SCATTER_STORE: > case IFN_LEN_STORE: > case IFN_LEN_MASK_STORE: > return true; > @@ -4454,8 +4445,10 @@ internal_gather_scatter_fn_p (internal_fn fn) > { > case IFN_GATHER_LOAD: > case IFN_MASK_GATHER_LOAD: > + case IFN_LEN_MASK_GATHER_LOAD: > case IFN_SCATTER_STORE: > case IFN_MASK_SCATTER_STORE: > + case IFN_LEN_MASK_SCATTER_STORE: > return true; > > default: > @@ -4477,6 +4470,10 @@ internal_fn_len_index (internal_fn fn) > case IFN_LEN_MASK_STORE: > return 2; > > + case IFN_LEN_MASK_GATHER_LOAD: > + case IFN_LEN_MASK_SCATTER_STORE: > + return 4; > + > default: > return -1; > } > @@ -4502,6 +4499,10 @@ internal_fn_mask_index (internal_fn fn) > case IFN_LEN_MASK_STORE: > return 4; > > + case IFN_LEN_MASK_GATHER_LOAD: > + case IFN_LEN_MASK_SCATTER_STORE: > + return 6; > + > default: > return (conditional_internal_fn_code (fn) != ERROR_MARK > || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1); > @@ -4520,6 +4521,7 @@ internal_fn_stored_value_index (internal_fn fn) > case IFN_MASK_STORE_LANES: > case IFN_SCATTER_STORE: > case IFN_MASK_SCATTER_STORE: > + case IFN_LEN_MASK_SCATTER_STORE: > return 3; > > case IFN_LEN_STORE: > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index d9fcca8430f..9b73e540d55 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3. If not see > - mask_load: currently just maskload > - load_lanes: currently just vec_load_lanes > - mask_load_lanes: currently just vec_mask_load_lanes > - - gather_load: used for {mask_,}gather_load > + - gather_load: used for {mask_,len_mask,}gather_load > - len_load: currently just len_load > - len_maskload: currently just len_maskload > > - mask_store: currently just maskstore > - store_lanes: currently just vec_store_lanes > - mask_store_lanes: currently just vec_mask_store_lanes > - - scatter_store: used for {mask_,}scatter_store > + - scatter_store: used for {mask_,len_mask,}scatter_store > - len_store: currently just len_store > - len_maskstore: currently just len_maskstore > > @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, > DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) > DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, > mask_gather_load, gather_load) > +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE, > + len_mask_gather_load, gather_load) > > DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) > DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) > @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, > len_maskload, len_maskload) > DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) > DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, > mask_scatter_store, scatter_store) > +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0, > + len_mask_scatter_store, scatter_store) > > DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) > DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index a901b68c538..73c9a0c760f 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b") > OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b") > OPTAB_CD(gather_load_optab, "gather_load$a$b") > OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") > +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b") > OPTAB_CD(scatter_store_optab, "scatter_store$a$b") > OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") > +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b") > OPTAB_CD(vec_extract_optab, "vec_extract$a$b") > OPTAB_CD(vec_init_optab, "vec_init$a$b")