On Tue, Dec 12, 2017 at 12:59:33AM +0000, Jeff Law wrote: > On 11/17/2017 02:36 AM, Richard Sandiford wrote: > > Richard Sandiford <richard.sandif...@linaro.org> writes: > >> This patch adds support for vectorising groups of IFN_MASK_LOADs > >> and IFN_MASK_STOREs using conditional load/store-lanes instructions. > >> This requires new internal functions to represent the result > >> (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. > >> > >> The normal IFN_{LOAD,STORE}_LANES functions are const operations > >> that logically just perform the permute: the load or store is > >> encoded as a MEM operand to the call statement. In contrast, > >> the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of > >> interface as IFN_MASK_{LOAD,STORE}, since the memory is only > >> conditionally accessed. > >> > >> The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. > >> > >> Tested on aarch64-linux-gnu (both with and without SVE), x86_64-linux-gnu > >> and powerpc64le-linux-gnu. OK to install? > > > > Here's an updated (and much simpler) version that applies on top of the > > series I just posted to remove vectorizable_mask_load_store. Tested as > > before. > > > > Thanks, > > Richard > > > > > > 2017-11-17 Richard Sandiford <richard.sandif...@linaro.org> > > Alan Hayward <alan.hayw...@arm.com> > > David Sherwood <david.sherw...@arm.com> > > > > gcc/ > > * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. > > (vec_mask_store_lanes@var{m}@var{n}): Likewise. > > * optabs.def (vec_mask_load_lanes_optab): New optab. > > (vec_mask_store_lanes_optab): Likewise. > > * internal-fn.def (MASK_LOAD_LANES): New internal function. > > (MASK_STORE_LANES): Likewise. > > * internal-fn.c (mask_load_lanes_direct): New macro. > > (mask_store_lanes_direct): Likewise. > > (expand_mask_load_optab_fn): Handle masked operations. > > (expand_mask_load_lanes_optab_fn): New macro. > > (expand_mask_store_optab_fn): Handle masked operations. > > (expand_mask_store_lanes_optab_fn): New macro. > > (direct_mask_load_lanes_optab_supported_p): Likewise. > > (direct_mask_store_lanes_optab_supported_p): Likewise. > > * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p > > parameter. > > (vect_load_lanes_supported): Likewise. > > * tree-vect-data-refs.c (strip_conversion): New function. > > (can_group_stmts_p): Likewise. > > (vect_analyze_data_ref_accesses): Use it instead of checking > > for a pair of assignments. > > (vect_store_lanes_supported): Take a masked_p parameter. > > (vect_load_lanes_supported): Likewise. > > * tree-vect-loop.c (vect_analyze_loop_2): Update calls to > > vect_store_lanes_supported and vect_load_lanes_supported. > > * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. > > * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p > > parameter. Don't allow gaps for masked accesses. > > Use vect_get_store_rhs. Update calls to vect_store_lanes_supported > > and vect_load_lanes_supported. > > (get_load_store_type): Take a masked_p parameter and update > > call to get_group_load_store_type. > > (vectorizable_store): Update call to get_load_store_type. > > Handle IFN_MASK_STORE_LANES. > > (vectorizable_load): Update call to get_load_store_type. > > Handle IFN_MASK_LOAD_LANES. > > > > gcc/testsuite/ > > * gcc.dg/vect/vect-ooo-group-1.c: New test. > > * gcc.target/aarch64/sve_mask_struct_load_1.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_1_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_2.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_2_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_3.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_3_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_4.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_5.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_6.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_7.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_load_8.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_1.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_1_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_2.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_2_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_3.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_3_run.c: Likewise. > > * gcc.target/aarch64/sve_mask_struct_store_4.c: Likewise. > > > > Index: gcc/doc/md.texi > > =================================================================== > > --- gcc/doc/md.texi 2017-11-17 09:06:19.783260344 +0000 > > +++ gcc/doc/md.texi 2017-11-17 09:35:23.400133274 +0000 > > @@ -4855,6 +4855,23 @@ loads for vectors of mode @var{n}. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{vec_mask_load_lanes@var{m}@var{n}} instruction pattern > > +@item @samp{vec_mask_load_lanes@var{m}@var{n}} > > +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional > > +mask operand (operand 2) that specifies which elements of the destination > > +vectors should be loaded. Other elements of the destination > > +vectors are set to zero. The operation is equivalent to: > > + > > +@smallexample > > +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); > > +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) > > + if (operand2[j]) > > + for (i = 0; i < c; i++) > > + operand0[i][j] = operand1[j * c + i]; > > +@end smallexample > Don't you need to set operand0[i][j] to zero if operand2[j] is zero for > this to be correct? And if that's the case, don't you need to expose > the set to zero as a side effect? > > > > > +@cindex @code{vec_mask_store_lanes@var{m}@var{n}} instruction pattern > > +@item @samp{vec_mask_store_lanes@var{m}@var{n}} > > +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional > > +mask operand (operand 2) that specifies which elements of the source > > +vectors should be stored. The operation is equivalent to: > > + > > +@smallexample > > +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); > > +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) > > + if (operand2[j]) > > + for (i = 0; i < c; i++) > > + operand0[j * c + i] = operand1[i][j]; > > +@end smallexample > > + > > +This pattern is not allowed to @code{FAIL}. > Is the asymmetry between loads and stores intentional? In particular > for loads "Other elements of the destination vectors are set to zero" > > > > > Index: gcc/tree-vect-data-refs.c > > =================================================================== > > --- gcc/tree-vect-data-refs.c 2017-11-17 09:35:23.085133247 +0000 > > +++ gcc/tree-vect-data-refs.c 2017-11-17 09:35:23.404633274 +0000 > > @@ -2791,6 +2791,62 @@ dr_group_sort_cmp (const void *dra_, con > > return cmp; > > } > > > > +/* If OP is the result of a conversion, return the unconverted value, > > + otherwise return null. */ > > + > > +static tree > > +strip_conversion (tree op) > > +{ > > + if (TREE_CODE (op) != SSA_NAME) > > + return NULL_TREE; > > + gimple *stmt = SSA_NAME_DEF_STMT (op); > > + if (!is_gimple_assign (stmt) > > + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))) > > + return NULL_TREE; > > + return gimple_assign_rhs1 (stmt); > > +} > DO you have any desire to walk back through multiple conversions? > They're only used for masks when comparing if masks are the same, so it > probably doesn't matter in practice if we handle multiple conversions I > guess. > > Somehow I know we've got to have an equivalent of this routine lying > around somewhere :-) Though I don't think it's worth the time to find. > > Not an ACK or NAK. I'm a bit hung up on the doc issue and how it > potentially impacts how we support this capability.
I have no comment on the midend parts, but when you get the OK - the AArch64 tests in this patch are OK. Thanks, James