On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Since we're now vectorizing by default at -O2 issues like PR101908
> become more important where we apply basic-block vectorization to
> parts of the function covering loads from function parameters passed
> on the stack.  Since we have no good idea how the stack pushing
> was performed but we do have a good idea that it was done recent
> when at the start of the function a STLF failure is inevitable
> if the argument passing code didn't use the same or larger vector
> stores and also has them aligned the same way.
>
> Until there's a robust IPA based solution the following implements
> target independent heuristics in the vectorizer to retain scalar
> loads for loads from parameters likely passed in memory (I use
> a BLKmode DECL_MODE check for this rather than firing up
> cummulative-args).  I've restricted this also to loads from the
> first "block" (that can be less than the first basic block if there's
> a call for example), since that covers the testcase.
I  prefer this patch to the md-reorg way.
Can vectorizer similarly handle by-reference passed arguments if their
corresponding real parameters are local variables of the caller?
.i.e. we can also prevent vectorization for the below case.

struct vec3 {
  double x, y, z;
};

struct ray {
 struct vec3 orig, dir;
};

void
__attribute__((noinline))
ray_sphere (struct ray* __restrict ray, double *__restrict res)
{
  res[0] = ray->orig.y * ray->dir.x;
  res[1] = ray->orig.z * ray->dir.y;
}

extern struct ray g;
void bar (double* res)
{
    struct ray tmp = g;
    ray_sphere (&tmp, res);
}

>
> Note that for the testcase (but not c-ray from the bugreport) there's
> a x86 peephole2 that vectorizes things back, so the patch is
> not effective there.
>
> Any comments?  I know we're also looking at x86 port specific
> mitigations but the issue will hit arm and power/z as well I think.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Thanks,
> Richard.
>
> 2022-03-25  Richard Biener  <rguent...@suse.de>
>
>         PR tree-optimization/101908
>         * tree-vect-stmts.cc (get_group_load_store_type): Add
>         heuristic to limit BB vectorization of function parameters.
>
>         * gcc.dg/vect/bb-slp-pr101908.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++
>  gcc/tree-vect-stmts.cc                      | 27 ++++++++++++++++++++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> new file mode 100644
> index 00000000000..b7534a18f0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +
> +struct vec3 {
> +  double x, y, z;
> +};
> +
> +struct ray {
> +  struct vec3 orig, dir;
> +};
> +
> +void ray_sphere (struct ray ray, double *res)
> +{
> +  res[0] = ray.orig.y * ray.dir.x;
> +  res[1] = ray.orig.z * ray.dir.y;
> +}
> +
> +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index f7449a79d1c..1e37e9678b6 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>    /* Stores can't yet have gaps.  */
>    gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
>
> -  if (slp_node)
> +  tree parm;
> +  if (!loop_vinfo
> +      && vls_type == VLS_LOAD
> +      /* The access is based on a PARM_DECL.  */
> +      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> +      && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), 
> true)
> +      && TREE_CODE (parm) == PARM_DECL
> +      /* Likely passed on the stack.  */
> +      && DECL_MODE (parm) == BLKmode
> +      /* The access is in the first group.  */
> +      && first_dr_info->group == 0)
> +    {
> +      /* When doing BB vectorizing force early loads from function parameters
> +        passed on the stack and thus stored recently to be done elementwise
> +        to avoid store-to-load forwarding penalties.
> +        Note this will cause vectorization to fail for the load because of
> +        the fear of underestimating the cost of elementwise accesses,
> +        see the end of get_load_store_type.  We are then going to effectively
> +        do the same via handling the loads as external input to the SLP.  */
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "Not using vector loads from function parameter "
> +                        "for the fear of causing a STLF fail\n");
> +      *memory_access_type = VMAT_ELEMENTWISE;
> +    }
> +  else if (slp_node)
>      {
>        /* For SLP vectorization we directly vectorize a subchain
>          without permutation.  */
> --
> 2.34.1



-- 
BR,
Hongtao

Reply via email to