Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

Hongtao Liu via Gcc-patches Fri, 25 Mar 2022 07:09:58 -0700

On Fri, Mar 25, 2022 at 9:42 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 25 Mar 2022, Hongtao Liu wrote:
>
> > On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Since we're now vectorizing by default at -O2 issues like PR101908
> > > become more important where we apply basic-block vectorization to
> > > parts of the function covering loads from function parameters passed
> > > on the stack.  Since we have no good idea how the stack pushing
> > > was performed but we do have a good idea that it was done recent
> > > when at the start of the function a STLF failure is inevitable
> > > if the argument passing code didn't use the same or larger vector
> > > stores and also has them aligned the same way.
> > >
> > > Until there's a robust IPA based solution the following implements
> > > target independent heuristics in the vectorizer to retain scalar
> > > loads for loads from parameters likely passed in memory (I use
> > > a BLKmode DECL_MODE check for this rather than firing up
> > > cummulative-args).  I've restricted this also to loads from the
> > > first "block" (that can be less than the first basic block if there's
> > > a call for example), since that covers the testcase.
> > I  prefer this patch to the md-reorg way.
>
> I was mostly posting the variant because it is target agnostic.  In
> general I agree that mitigation is best done at the RTL level after
> scheduling/RA so that the instruction sequence from function entry
> is visible.
>
> Did you have any success in coding this up yet?  Do you think it
Not yet.
> can be done in a way to be re-used by multiple targets?
>
> > Can vectorizer similarly handle by-reference passed arguments if their
> > corresponding real parameters are local variables of the caller?
> > .i.e. we can also prevent vectorization for the below case.
> >
> > struct vec3 {
> >   double x, y, z;
> > };
> >
> > struct ray {
> >  struct vec3 orig, dir;
> > };
> >
> > void
> > __attribute__((noinline))
> > ray_sphere (struct ray* __restrict ray, double *__restrict res)
> > {
> >   res[0] = ray->orig.y * ray->dir.x;
> >   res[1] = ray->orig.z * ray->dir.y;
> > }
> >
> > extern struct ray g;
> > void bar (double* res)
> > {
> >     struct ray tmp = g;
> >     ray_sphere (&tmp, res);
> > }
>
> Without IPA analysis this will be hard, if not impossible.
>
> Richard.
>
> > >
> > > Note that for the testcase (but not c-ray from the bugreport) there's
> > > a x86 peephole2 that vectorizes things back, so the patch is
> > > not effective there.
> > >
> > > Any comments?  I know we're also looking at x86 port specific
> > > mitigations but the issue will hit arm and power/z as well I think.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2022-03-25  Richard Biener  <rguent...@suse.de>
> > >
> > >         PR tree-optimization/101908
> > >         * tree-vect-stmts.cc (get_group_load_store_type): Add
> > >         heuristic to limit BB vectorization of function parameters.
> > >
> > >         * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++
> > >  gcc/tree-vect-stmts.cc                      | 27 ++++++++++++++++++++-
> > >  2 files changed, 44 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > > new file mode 100644
> > > index 00000000000..b7534a18f0e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_double } */
> > > +
> > > +struct vec3 {
> > > +  double x, y, z;
> > > +};
> > > +
> > > +struct ray {
> > > +  struct vec3 orig, dir;
> > > +};
> > > +
> > > +void ray_sphere (struct ray ray, double *res)
> > > +{
> > > +  res[0] = ray.orig.y * ray.dir.x;
> > > +  res[1] = ray.orig.z * ray.dir.y;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index f7449a79d1c..1e37e9678b6 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> > > stmt_vec_info stmt_info,
> > >    /* Stores can't yet have gaps.  */
> > >    gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
> > >
> > > -  if (slp_node)
> > > +  tree parm;
> > > +  if (!loop_vinfo
> > > +      && vls_type == VLS_LOAD
> > > +      /* The access is based on a PARM_DECL.  */
> > > +      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> > > +      && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 
> > > 0)), true)
> > > +      && TREE_CODE (parm) == PARM_DECL
> > > +      /* Likely passed on the stack.  */
> > > +      && DECL_MODE (parm) == BLKmode
> > > +      /* The access is in the first group.  */
> > > +      && first_dr_info->group == 0)
> > > +    {
> > > +      /* When doing BB vectorizing force early loads from function 
> > > parameters
> > > +        passed on the stack and thus stored recently to be done 
> > > elementwise
> > > +        to avoid store-to-load forwarding penalties.
> > > +        Note this will cause vectorization to fail for the load because 
> > > of
> > > +        the fear of underestimating the cost of elementwise accesses,
> > > +        see the end of get_load_store_type.  We are then going to 
> > > effectively
> > > +        do the same via handling the loads as external input to the SLP. 
> > >  */
> > > +      if (dump_enabled_p ())
> > > +       dump_printf_loc (MSG_NOTE, vect_location,
> > > +                        "Not using vector loads from function parameter "
> > > +                        "for the fear of causing a STLF fail\n");
> > > +      *memory_access_type = VMAT_ELEMENTWISE;
> > > +    }
> > > +  else if (slp_node)
> > >      {
> > >        /* For SLP vectorization we directly vectorize a subchain
> > >          without permutation.  */
> > > --
> > > 2.34.1
> >
> >
> >
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)




--
BR,
Hongtao

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

Reply via email to