On Fri, 25 Mar 2022, Hongtao Liu wrote:

> On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Since we're now vectorizing by default at -O2 issues like PR101908
> > become more important where we apply basic-block vectorization to
> > parts of the function covering loads from function parameters passed
> > on the stack.  Since we have no good idea how the stack pushing
> > was performed but we do have a good idea that it was done recent
> > when at the start of the function a STLF failure is inevitable
> > if the argument passing code didn't use the same or larger vector
> > stores and also has them aligned the same way.
> >
> > Until there's a robust IPA based solution the following implements
> > target independent heuristics in the vectorizer to retain scalar
> > loads for loads from parameters likely passed in memory (I use
> > a BLKmode DECL_MODE check for this rather than firing up
> > cummulative-args).  I've restricted this also to loads from the
> > first "block" (that can be less than the first basic block if there's
> > a call for example), since that covers the testcase.
> I  prefer this patch to the md-reorg way.

I was mostly posting the variant because it is target agnostic.  In
general I agree that mitigation is best done at the RTL level after
scheduling/RA so that the instruction sequence from function entry
is visible.

Did you have any success in coding this up yet?  Do you think it
can be done in a way to be re-used by multiple targets?

> Can vectorizer similarly handle by-reference passed arguments if their
> corresponding real parameters are local variables of the caller?
> .i.e. we can also prevent vectorization for the below case.
> 
> struct vec3 {
>   double x, y, z;
> };
> 
> struct ray {
>  struct vec3 orig, dir;
> };
> 
> void
> __attribute__((noinline))
> ray_sphere (struct ray* __restrict ray, double *__restrict res)
> {
>   res[0] = ray->orig.y * ray->dir.x;
>   res[1] = ray->orig.z * ray->dir.y;
> }
> 
> extern struct ray g;
> void bar (double* res)
> {
>     struct ray tmp = g;
>     ray_sphere (&tmp, res);
> }

Without IPA analysis this will be hard, if not impossible.

Richard.

> >
> > Note that for the testcase (but not c-ray from the bugreport) there's
> > a x86 peephole2 that vectorizes things back, so the patch is
> > not effective there.
> >
> > Any comments?  I know we're also looking at x86 port specific
> > mitigations but the issue will hit arm and power/z as well I think.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-25  Richard Biener  <rguent...@suse.de>
> >
> >         PR tree-optimization/101908
> >         * tree-vect-stmts.cc (get_group_load_store_type): Add
> >         heuristic to limit BB vectorization of function parameters.
> >
> >         * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++
> >  gcc/tree-vect-stmts.cc                      | 27 ++++++++++++++++++++-
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > new file mode 100644
> > index 00000000000..b7534a18f0e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_double } */
> > +
> > +struct vec3 {
> > +  double x, y, z;
> > +};
> > +
> > +struct ray {
> > +  struct vec3 orig, dir;
> > +};
> > +
> > +void ray_sphere (struct ray ray, double *res)
> > +{
> > +  res[0] = ray.orig.y * ray.dir.x;
> > +  res[1] = ray.orig.z * ray.dir.y;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index f7449a79d1c..1e37e9678b6 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> >    /* Stores can't yet have gaps.  */
> >    gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
> >
> > -  if (slp_node)
> > +  tree parm;
> > +  if (!loop_vinfo
> > +      && vls_type == VLS_LOAD
> > +      /* The access is based on a PARM_DECL.  */
> > +      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> > +      && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), 
> > true)
> > +      && TREE_CODE (parm) == PARM_DECL
> > +      /* Likely passed on the stack.  */
> > +      && DECL_MODE (parm) == BLKmode
> > +      /* The access is in the first group.  */
> > +      && first_dr_info->group == 0)
> > +    {
> > +      /* When doing BB vectorizing force early loads from function 
> > parameters
> > +        passed on the stack and thus stored recently to be done elementwise
> > +        to avoid store-to-load forwarding penalties.
> > +        Note this will cause vectorization to fail for the load because of
> > +        the fear of underestimating the cost of elementwise accesses,
> > +        see the end of get_load_store_type.  We are then going to 
> > effectively
> > +        do the same via handling the loads as external input to the SLP.  
> > */
> > +      if (dump_enabled_p ())
> > +       dump_printf_loc (MSG_NOTE, vect_location,
> > +                        "Not using vector loads from function parameter "
> > +                        "for the fear of causing a STLF fail\n");
> > +      *memory_access_type = VMAT_ELEMENTWISE;
> > +    }
> > +  else if (slp_node)
> >      {
> >        /* For SLP vectorization we directly vectorize a subchain
> >          without permutation.  */
> > --
> > 2.34.1
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Reply via email to