On Thu, 2 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richi.
> 
> The following is the V2 patch:
> Testing on X86 and aarch64 are running.
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..e7f7f976f11 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -760,7 +760,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
> -                 || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> +                 && !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       type)))

That's not what I wrote.  I wrote to let == BOOLEAN_TYPE pass without
check here, thus

 -             && (TREE_CODE (type) == BOOLEAN_TYPE
 +             && TREE_CODE (type) != BOOLEAN_TYPE
               && !can_duplicate...

>             {
>               if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ce4868d3e1..6c47121e158 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
>        mask_index = internal_fn_mask_index (ifn);
>        if (mask_index >= 0 && slp_node)
>         mask_index = vect_slp_child_index_for_operand (call, mask_index);
> +      slp_tree slp_op = NULL;
>        if (mask_index >= 0
>           && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> -                                     &mask, NULL, &mask_dt, &mask_vectype))
> +                                     &mask, &slp_op, &mask_dt, 
> &mask_vectype))
>         return false;
> +      /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
> +        MASK_VECTYPE.  */
> +      if (mask_index >= 0 && slp_node && mask_dt == vect_constant_def
> +         && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> +       gcc_unreachable ();

You shouldn't do this here.  Theres code in if (costing_p) that
would need to be updated if you (correctly) want to track slp_op here.

>      }
> 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 19:11
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [tree-optimization/111721] VECT: Support SLP for 
> MASK_LEN_GATHER_LOAD with dummy mask
> On Thu, 2 Nov 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes following FAILs for RVV:
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > Bootstrap on X86 and regtest passed.
> > 
> > Tested on aarch64 passed.
> > 
> > Ok for trunk ?
> > 
> >         PR tree-optimization/111721
> > 
> > gcc/ChangeLog:
> > 
> >         * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> > dummy mask -1.
> >         * tree-vect-stmts.cc (vectorizable_load): Ditto.
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 14 ++++++++++++--
> >  gcc/tree-vect-stmts.cc |  8 +++++++-
> >  2 files changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 43d742e3c92..23ca0318e31 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -756,8 +756,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> > char swap,
> >  {
> >    tree type = TREE_TYPE (oprnd);
> >    dt = dts[i];
> > -   if ((dt == vect_constant_def
> > -        || dt == vect_external_def)
> > +   if (dt == vect_external_def
> >        && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >        && (TREE_CODE (type) == BOOLEAN_TYPE
> >    || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> > @@ -769,6 +768,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> > char swap,
> >  "for variable-length SLP %T\n", oprnd);
> >        return -1;
> >      }
> > +   if (dt == vect_constant_def
> > +       && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> > +       && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> > +     {
> > +       if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "Build SLP failed: invalid type of def "
> > + "for variable-length SLP %T\n",
> > + oprnd);
> > +       return -1;
> > +     }
>  
> I don't think that's quite correct.  can_duplicate_and_interleave_p
> doesn't get enough info here and IIRC even materializing arbitrary
> constants isn't possible with VLA vectors.  The very first thing
> the function does is
>  
>   tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, 
> count);
>   if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
>     return false;
>  
> but for masks that's not going to get us the correct vector type.
> While I don't understand why we have that 'BOOLEAN_TYPE' special
> case (maybe the intent was to identify 'mask' operands that way?),
> we might want to require that we can materialize both all-zero
> and all-ones constant 'mask's.  But then 'mask' operands should
> be properly identified here.
>  
> Maybe we can also simply delay the check to the point we know
> whether we're facing an uniform constant or not (note for 'first',
> we cannot really special-case vect_constant_def as the second
> SLP lane might demote that to vect_external_def).  It's always
> a balance of whether to reject sth at SLP build time (possibly
> allowing operand swapping to do magic) or to delay checks
> to stmt analysis time.  That might also explain that you
> do not see fallout of the "wrong" change (the later checking
> will catch it anyway).
>  
> There's probably testsuite coverage for SVE here.
>  
> That said, a "correct" patch might be to simply change
>  
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> (),
>                                                       type)))
>  
> to
>  
>        && TREE_CODE (type) != BOOLEAN_TYPE
>        && !can_duplicate_and_interleave_p (vinfo, stmts.length 
> (),       
>                                                       type)
>  
> thus delay 'mask' operand validation here.
>  
> Note I still think we should improve TREE_CODE (type) == BOOLEAN_TYPE
> to identify internal function mask operands only.
>  
> Richard.
>  
>  
> >  
> >    /* For the swapping logic below force vect_reduction_def
> >       for the reduction op in a SLP reduction group.  */
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 6ce4868d3e1..6c47121e158 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
> >        mask_index = internal_fn_mask_index (ifn);
> >        if (mask_index >= 0 && slp_node)
> >  mask_index = vect_slp_child_index_for_operand (call, mask_index);
> > +      slp_tree slp_op = NULL;
> >        if (mask_index >= 0
> >    && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> > -       &mask, NULL, &mask_dt, &mask_vectype))
> > +       &mask, &slp_op, &mask_dt, &mask_vectype))
> >  return false;
> > +      /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
> > + MASK_VECTYPE.  */
> > +      if (mask_index >= 0 && slp_node && mask_dt == vect_constant_def
> > +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> > + gcc_unreachable ();
> >      }
> >  
> >    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > 
>  
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to