Re: Add support for masked load/store_lanes

James Greenhalgh Sun, 07 Jan 2018 12:51:30 -0800

On Tue, Dec 12, 2017 at 12:59:33AM +0000, Jeff Law wrote:
> On 11/17/2017 02:36 AM, Richard Sandiford wrote:
> > Richard Sandiford <richard.sandif...@linaro.org> writes:
> >> This patch adds support for vectorising groups of IFN_MASK_LOADs
> >> and IFN_MASK_STOREs using conditional load/store-lanes instructions.
> >> This requires new internal functions to represent the result
> >> (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
> >>
> >> The normal IFN_{LOAD,STORE}_LANES functions are const operations
> >> that logically just perform the permute: the load or store is
> >> encoded as a MEM operand to the call statement.  In contrast,
> >> the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
> >> interface as IFN_MASK_{LOAD,STORE}, since the memory is only
> >> conditionally accessed.
> >>
> >> The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
> >>
> >> Tested on aarch64-linux-gnu (both with and without SVE), x86_64-linux-gnu
> >> and powerpc64le-linux-gnu.  OK to install?
> > 
> > Here's an updated (and much simpler) version that applies on top of the
> > series I just posted to remove vectorizable_mask_load_store.  Tested as
> > before.
> > 
> > Thanks,
> > Richard
> > 
> > 
> > 2017-11-17  Richard Sandiford  <richard.sandif...@linaro.org>
> >         Alan Hayward  <alan.hayw...@arm.com>
> >         David Sherwood  <david.sherw...@arm.com>
> > 
> > gcc/
> >     * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
> >     (vec_mask_store_lanes@var{m}@var{n}): Likewise.
> >     * optabs.def (vec_mask_load_lanes_optab): New optab.
> >     (vec_mask_store_lanes_optab): Likewise.
> >     * internal-fn.def (MASK_LOAD_LANES): New internal function.
> >     (MASK_STORE_LANES): Likewise.
> >     * internal-fn.c (mask_load_lanes_direct): New macro.
> >     (mask_store_lanes_direct): Likewise.
> >     (expand_mask_load_optab_fn): Handle masked operations.
> >     (expand_mask_load_lanes_optab_fn): New macro.
> >     (expand_mask_store_optab_fn): Handle masked operations.
> >     (expand_mask_store_lanes_optab_fn): New macro.
> >     (direct_mask_load_lanes_optab_supported_p): Likewise.
> >     (direct_mask_store_lanes_optab_supported_p): Likewise.
> >     * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
> >     parameter.
> >     (vect_load_lanes_supported): Likewise.
> >     * tree-vect-data-refs.c (strip_conversion): New function.
> >     (can_group_stmts_p): Likewise.
> >     (vect_analyze_data_ref_accesses): Use it instead of checking
> >     for a pair of assignments.
> >     (vect_store_lanes_supported): Take a masked_p parameter.
> >     (vect_load_lanes_supported): Likewise.
> >     * tree-vect-loop.c (vect_analyze_loop_2): Update calls to
> >     vect_store_lanes_supported and vect_load_lanes_supported.
> >     * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
> >     * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
> >     parameter.  Don't allow gaps for masked accesses.
> >     Use vect_get_store_rhs.  Update calls to vect_store_lanes_supported
> >     and vect_load_lanes_supported.
> >     (get_load_store_type): Take a masked_p parameter and update
> >     call to get_group_load_store_type.
> >     (vectorizable_store): Update call to get_load_store_type.
> >     Handle IFN_MASK_STORE_LANES.
> >     (vectorizable_load): Update call to get_load_store_type.
> >     Handle IFN_MASK_LOAD_LANES.
> > 
> > gcc/testsuite/
> >     * gcc.dg/vect/vect-ooo-group-1.c: New test.
> >     * gcc.target/aarch64/sve_mask_struct_load_1.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_1_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_2.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_2_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_3.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_3_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_4.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_5.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_6.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_7.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_load_8.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_1.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_1_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_2.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_2_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_3.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_3_run.c: Likewise.
> >     * gcc.target/aarch64/sve_mask_struct_store_4.c: Likewise.
> > 
> > Index: gcc/doc/md.texi
> > ===================================================================
> > --- gcc/doc/md.texi 2017-11-17 09:06:19.783260344 +0000
> > +++ gcc/doc/md.texi 2017-11-17 09:35:23.400133274 +0000
> > @@ -4855,6 +4855,23 @@ loads for vectors of mode @var{n}.
> >  
> >  This pattern is not allowed to @code{FAIL}.
> >  
> > +@cindex @code{vec_mask_load_lanes@var{m}@var{n}} instruction pattern
> > +@item @samp{vec_mask_load_lanes@var{m}@var{n}}
> > +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
> > +mask operand (operand 2) that specifies which elements of the destination
> > +vectors should be loaded.  Other elements of the destination
> > +vectors are set to zero.  The operation is equivalent to:
> > +
> > +@smallexample
> > +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> > +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> > +  if (operand2[j])
> > +    for (i = 0; i < c; i++)
> > +      operand0[i][j] = operand1[j * c + i];
> > +@end smallexample
> Don't you need to set operand0[i][j] to zero if operand2[j] is zero for
> this to be correct?  And if that's the case, don't you need to expose
> the set to zero as a side effect?
> 
> 
> 
> > +@cindex @code{vec_mask_store_lanes@var{m}@var{n}} instruction pattern
> > +@item @samp{vec_mask_store_lanes@var{m}@var{n}}
> > +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional
> > +mask operand (operand 2) that specifies which elements of the source
> > +vectors should be stored.  The operation is equivalent to:
> > +
> > +@smallexample
> > +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> > +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> > +  if (operand2[j])
> > +    for (i = 0; i < c; i++)
> > +      operand0[j * c + i] = operand1[i][j];
> > +@end smallexample
> > +
> > +This pattern is not allowed to @code{FAIL}.
> Is the asymmetry between loads and stores intentional?  In particular
> for loads "Other elements of the destination vectors are set to zero"
> 
> 
> 
> > Index: gcc/tree-vect-data-refs.c
> > ===================================================================
> > --- gcc/tree-vect-data-refs.c       2017-11-17 09:35:23.085133247 +0000
> > +++ gcc/tree-vect-data-refs.c       2017-11-17 09:35:23.404633274 +0000
> > @@ -2791,6 +2791,62 @@ dr_group_sort_cmp (const void *dra_, con
> >    return cmp;
> >  }
> >  
> > +/* If OP is the result of a conversion, return the unconverted value,
> > +   otherwise return null.  */
> > +
> > +static tree
> > +strip_conversion (tree op)
> > +{
> > +  if (TREE_CODE (op) != SSA_NAME)
> > +    return NULL_TREE;
> > +  gimple *stmt = SSA_NAME_DEF_STMT (op);
> > +  if (!is_gimple_assign (stmt)
> > +      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
> > +    return NULL_TREE;
> > +  return gimple_assign_rhs1 (stmt);
> > +}
> DO you have any desire to walk back through multiple conversions?
> They're only used for masks when comparing if masks are the same, so it
> probably doesn't matter in practice if we handle multiple conversions I
> guess.
> 
> Somehow I know we've got to have an equivalent of this routine lying
> around somewhere :-)  Though I don't think it's worth the time to find.
> 
> Not an ACK or NAK.  I'm a bit hung up on the doc issue and how it
> potentially impacts how we support this capability.


I have no comment on the midend parts, but when you get the OK - the AArch64
tests in this patch are OK.

Thanks,
James

Re: Add support for masked load/store_lanes

Reply via email to