On Fri, Mar 25, 2022 at 9:42 PM Richard Biener <rguent...@suse.de> wrote: > > On Fri, 25 Mar 2022, Hongtao Liu wrote: > > > On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Since we're now vectorizing by default at -O2 issues like PR101908 > > > become more important where we apply basic-block vectorization to > > > parts of the function covering loads from function parameters passed > > > on the stack. Since we have no good idea how the stack pushing > > > was performed but we do have a good idea that it was done recent > > > when at the start of the function a STLF failure is inevitable > > > if the argument passing code didn't use the same or larger vector > > > stores and also has them aligned the same way. > > > > > > Until there's a robust IPA based solution the following implements > > > target independent heuristics in the vectorizer to retain scalar > > > loads for loads from parameters likely passed in memory (I use > > > a BLKmode DECL_MODE check for this rather than firing up > > > cummulative-args). I've restricted this also to loads from the > > > first "block" (that can be less than the first basic block if there's > > > a call for example), since that covers the testcase. > > I prefer this patch to the md-reorg way. > > I was mostly posting the variant because it is target agnostic. In > general I agree that mitigation is best done at the RTL level after > scheduling/RA so that the instruction sequence from function entry > is visible. > > Did you have any success in coding this up yet? Do you think it Not yet. > can be done in a way to be re-used by multiple targets? > > > Can vectorizer similarly handle by-reference passed arguments if their > > corresponding real parameters are local variables of the caller? > > .i.e. we can also prevent vectorization for the below case. > > > > struct vec3 { > > double x, y, z; > > }; > > > > struct ray { > > struct vec3 orig, dir; > > }; > > > > void > > __attribute__((noinline)) > > ray_sphere (struct ray* __restrict ray, double *__restrict res) > > { > > res[0] = ray->orig.y * ray->dir.x; > > res[1] = ray->orig.z * ray->dir.y; > > } > > > > extern struct ray g; > > void bar (double* res) > > { > > struct ray tmp = g; > > ray_sphere (&tmp, res); > > } > > Without IPA analysis this will be hard, if not impossible. > > Richard. > > > > > > > Note that for the testcase (but not c-ray from the bugreport) there's > > > a x86 peephole2 that vectorizes things back, so the patch is > > > not effective there. > > > > > > Any comments? I know we're also looking at x86 port specific > > > mitigations but the issue will hit arm and power/z as well I think. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > Thanks, > > > Richard. > > > > > > 2022-03-25 Richard Biener <rguent...@suse.de> > > > > > > PR tree-optimization/101908 > > > * tree-vect-stmts.cc (get_group_load_store_type): Add > > > heuristic to limit BB vectorization of function parameters. > > > > > > * gcc.dg/vect/bb-slp-pr101908.c: New testcase. > > > --- > > > gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++ > > > gcc/tree-vect-stmts.cc | 27 ++++++++++++++++++++- > > > 2 files changed, 44 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > new file mode 100644 > > > index 00000000000..b7534a18f0e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > @@ -0,0 +1,18 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_double } */ > > > + > > > +struct vec3 { > > > + double x, y, z; > > > +}; > > > + > > > +struct ray { > > > + struct vec3 orig, dir; > > > +}; > > > + > > > +void ray_sphere (struct ray ray, double *res) > > > +{ > > > + res[0] = ray.orig.y * ray.dir.x; > > > + res[1] = ray.orig.z * ray.dir.y; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */ > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index f7449a79d1c..1e37e9678b6 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, > > > stmt_vec_info stmt_info, > > > /* Stores can't yet have gaps. */ > > > gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0); > > > > > > - if (slp_node) > > > + tree parm; > > > + if (!loop_vinfo > > > + && vls_type == VLS_LOAD > > > + /* The access is based on a PARM_DECL. */ > > > + && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR > > > + && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), > > > 0)), true) > > > + && TREE_CODE (parm) == PARM_DECL > > > + /* Likely passed on the stack. */ > > > + && DECL_MODE (parm) == BLKmode > > > + /* The access is in the first group. */ > > > + && first_dr_info->group == 0) > > > + { > > > + /* When doing BB vectorizing force early loads from function > > > parameters > > > + passed on the stack and thus stored recently to be done > > > elementwise > > > + to avoid store-to-load forwarding penalties. > > > + Note this will cause vectorization to fail for the load because > > > of > > > + the fear of underestimating the cost of elementwise accesses, > > > + see the end of get_load_store_type. We are then going to > > > effectively > > > + do the same via handling the loads as external input to the SLP. > > > */ > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > + "Not using vector loads from function parameter " > > > + "for the fear of causing a STLF fail\n"); > > > + *memory_access_type = VMAT_ELEMENTWISE; > > > + } > > > + else if (slp_node) > > > { > > > /* For SLP vectorization we directly vectorize a subchain > > > without permutation. */ > > > -- > > > 2.34.1 > > > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
-- BR, Hongtao