On Thu, Nov 19, 2020 at 1:04 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Mon, Nov 16, 2020 at 10:58 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >> > Does the patch also vectorize with SVE loops that have
> >> > unknown loop bound?  The documentation isn't entirely
> >> > conclusive there.
> >>
> >> Yeah, for SVE it vectorises.  How about changing:
> >>
> >>   For example, if each iteration of a vectorized loop would handle
> >>   exactly four iterations, …
> >>
> >> to:
> >>
> >>   For example, if each iteration of a vectorized loop could only
> >>   handle exactly four iterations of the original scalar loop, …
> >>
> >> ?
> >
> > Yeah, guess that's better.
> >
> >>
> >> > Iff the iteration count is a multiple of two and the target can
> >> > vectorize the loop with both VF 2 and VF 4 but VF 4 would be better if
> >> > we'd use the 'cheap' cost model, does 'very-cheap' not vectorize the
> >> > loop or does it choose VF 2?
> >>
> >> It would choose VF 2, if that's still a win over scalar code.
> >
> > OK, that's what I expected.  The VF iteration is one source of
> > compile-time that we might want to avoid somehow ... on
> > x86_64 knowing the precise number of constant iterations
> > should allow to only pick a subset of vector modes based on
> > largest_pow2_factor or so?  Or maybe just use the preferred
> > SIMD mode for cheap/very-cheap?  (maybe pass down
> > the cost model kind to the target hook so targets can decide
> > for themselves here)
>
> On the preferred simd mode thing: TBH, I'd prefer to get rid
> of that hook one day and just rely on autovectorize_vector_modes.
>
> The difficulty with adding an early check is that we don't know ahead
> of time which types of scalar element a loop operates on: we only find
> that out on the fly during the first analysis of the loop.  The check
> would also depend on SLP grouping: we can use a vector of 4 ints to
> handle 2 iterations of the scalar loop if the ints are in an SLP
> group of size 2.
>
> So I agree it would be nice to have early-outs, but I think we'd have
> to restructure things first.  E.g. maybe we could do some “cheap” initial
> analysis that checks for basic vectorisability, records which scalar
> elements are used by the loop, and records how big the containing SLP
> groups might be (based on optimistic assumptions).  Then we can use
> that to prefilter the modes we try (perhaps all the way down to no modes).
> I guess that's conceptually similar to building an SLP graph though.
>
> Does the attached look OK?  I've included a version of the updated
> wording above.  I also changed this condition to use “>” rather
> than “>=”:

Yes, OK.

Thanks,
Richard.

>
>   /* If the vector loop needs multiple iterations to be beneficial then
>      things are probably too close to call, and the conservative thing
>      would be to stick with the scalar code.  */
>   if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
>       && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo))
>
> since when min_profitable_estimate == min_profitable_iters
> we'll have done:
>
>   if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>       && min_profitable_iters < (assumed_vf + peel_iters_prologue))
>     /* We want the vectorized loop to execute at least once.  */
>     min_profitable_iters = assumed_vf + peel_iters_prologue;
>
> I also tried to make vect-cost-model-4.c more resilient on targets
> that require alignment.
>
> Thanks,
> Richard
>
>
> gcc/
>         * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
>         * common.opt (fvect-cost-model=): Add very-cheap as a possible option.
>         (fsimd-cost-model=): Likewise.
>         (vect_cost_model): Add very-cheap.
>         * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
>         Put the values in order of increasing aggressiveness.
>         * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
>         range checks when comparing against VECT_COST_MODEL_CHEAP.
>         (vect_prune_runtime_alias_test_list): Do not allow any alias
>         checks for the very-cheap cost model.
>         * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
>         any peeling for the very-cheap cost model.  Also require one
>         iteration of the vector loop to pay for itself.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-cost-model-1.c: New test.
>         * gcc.dg/vect/vect-cost-model-2.c: Likewise.
>         * gcc.dg/vect/vect-cost-model-3.c: Likewise.
>         * gcc.dg/vect/vect-cost-model-4.c: Likewise.
>         * gcc.dg/vect/vect-cost-model-5.c: Likewise.
>         * gcc.dg/vect/vect-cost-model-6.c: Likewise.
> ---
>  gcc/common.opt                                |  7 +++--
>  gcc/doc/invoke.texi                           | 12 +++++++--
>  gcc/flag-types.h                              | 10 ++++---
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c | 11 ++++++++
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c | 11 ++++++++
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c | 11 ++++++++
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c | 13 +++++++++
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c | 11 ++++++++
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c | 12 +++++++++
>  gcc/tree-vect-data-refs.c                     |  8 ++++--
>  gcc/tree-vect-loop.c                          | 27 +++++++++++++++++++
>  11 files changed, 123 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index fe39b3dee9f..ca8a2690799 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3020,11 +3020,11 @@ Enable basic block vectorization (SLP) on trees.
>
>  fvect-cost-model=
>  Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) 
> Init(VECT_COST_MODEL_DEFAULT) Optimization
> --fvect-cost-model=[unlimited|dynamic|cheap]    Specifies the cost model for 
> vectorization.
> +-fvect-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the cost 
> model for vectorization.
>
>  fsimd-cost-model=
>  Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) 
> Init(VECT_COST_MODEL_UNLIMITED) Optimization
> --fsimd-cost-model=[unlimited|dynamic|cheap]    Specifies the vectorization 
> cost model for code marked with a simd directive.
> +-fsimd-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the 
> vectorization cost model for code marked with a simd directive.
>
>  Enum
>  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
> vectorizer cost model %qs)
> @@ -3038,6 +3038,9 @@ Enum(vect_cost_model) String(dynamic) 
> Value(VECT_COST_MODEL_DYNAMIC)
>  EnumValue
>  Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
>
> +EnumValue
> +Enum(vect_cost_model) String(very-cheap) Value(VECT_COST_MODEL_VERY_CHEAP)
> +
>  fvect-cost-model
>  Common Alias(fvect-cost-model=,dynamic,unlimited)
>  Enables the dynamic vectorizer cost model.  Preserved for backward 
> compatibility.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 3510a54c6c4..07232c6b33d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11440,7 +11440,8 @@ and @option{-fauto-profile}.
>  @item -fvect-cost-model=@var{model}
>  @opindex fvect-cost-model
>  Alter the cost model used for vectorization.  The @var{model} argument
> -should be one of @samp{unlimited}, @samp{dynamic} or @samp{cheap}.
> +should be one of @samp{unlimited}, @samp{dynamic}, @samp{cheap} or
> +@samp{very-cheap}.
>  With the @samp{unlimited} model the vectorized code-path is assumed
>  to be profitable while with the @samp{dynamic} model a runtime check
>  guards the vectorized code-path to enable it only for iteration
> @@ -11448,7 +11449,14 @@ counts that will likely execute faster than when 
> executing the original
>  scalar loop.  The @samp{cheap} model disables vectorization of
>  loops where doing so would be cost prohibitive for example due to
>  required runtime checks for data dependence or alignment but otherwise
> -is equal to the @samp{dynamic} model.
> +is equal to the @samp{dynamic} model.  The @samp{very-cheap} model only
> +allows vectorization if the vector code would entirely replace the
> +scalar code that is being vectorized.  For example, if each iteration
> +of a vectorized loop would only be able to handle exactly four iterations
> +of the scalar loop, the @samp{very-cheap} model would only allow
> +vectorization if the scalar iteration count is known to be a multiple
> +of four.
> +
>  The default cost model depends on other optimization flags and is
>  either @samp{dynamic} or @samp{cheap}.
>
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index 648ed096e30..0dbab19943c 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -232,12 +232,14 @@ enum scalar_storage_order_kind {
>    SSO_LITTLE_ENDIAN
>  };
>
> -/* Vectorizer cost-model.  */
> +/* Vectorizer cost-model.  Except for DEFAULT, the values are ordered from
> +   the most conservative to the least conservative.  */
>  enum vect_cost_model {
> +  VECT_COST_MODEL_VERY_CHEAP = -3,
> +  VECT_COST_MODEL_CHEAP = -2,
> +  VECT_COST_MODEL_DYNAMIC = -1,
>    VECT_COST_MODEL_UNLIMITED = 0,
> -  VECT_COST_MODEL_CHEAP = 1,
> -  VECT_COST_MODEL_DYNAMIC = 2,
> -  VECT_COST_MODEL_DEFAULT = 3
> +  VECT_COST_MODEL_DEFAULT = 1
>  };
>
>  /* Different instrumentation modes.  */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c
> new file mode 100644
> index 00000000000..0737da5d671
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } 
> */
> +
> +void
> +f (int *x, int *y)
> +{
> +  for (unsigned int i = 0; i < 1024; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c
> new file mode 100644
> index 00000000000..fa9bdb607b2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize 
> -fvect-cost-model=very-cheap" } */
> +
> +void
> +f (int *x, int *y)
> +{
> +  for (unsigned int i = 0; i < 1024; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not {LOOP VECTORIZED} vect { target vect_int 
> } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c
> new file mode 100644
> index 00000000000..d7c6cfd2049
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } 
> */
> +
> +void
> +f (int *restrict x, int *restrict y)
> +{
> +  for (unsigned int i = 0; i < 1024; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c
> new file mode 100644
> index 00000000000..bb018ad99fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize 
> -fvect-cost-model=very-cheap" } */
> +
> +int x[1024], y[1024];
> +
> +void
> +f (void)
> +{
> +  for (unsigned int i = 0; i < 1024; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c
> new file mode 100644
> index 00000000000..536ec0a3cda
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } 
> */
> +
> +void
> +f (int *restrict x, int *restrict y)
> +{
> +  for (unsigned int i = 0; i < 1023; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } 
> } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c
> new file mode 100644
> index 00000000000..552febb5fee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -ftree-vectorize 
> -fvect-cost-model=very-cheap" } */
> +
> +void
> +f (int *restrict x, int *restrict y)
> +{
> +  for (unsigned int i = 0; i < 1023; ++i)
> +    x[i] += y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target { vect_int && 
> vect_partial_vectors_usage_2 } } } } */
> +/* { dg-final { scan-tree-dump-not {LOOP VECTORIZED} vect { target { 
> vect_int && { ! vect_partial_vectors_usage_2 } } } } } */
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 0efab495407..18e36c89d14 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -2161,7 +2161,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>          {
>            unsigned max_allowed_peel
>             = param_vect_max_peeling_for_alignment;
> -         if (flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
> +         if (flag_vect_cost_model <= VECT_COST_MODEL_CHEAP)
>             max_allowed_peel = 0;
>            if (max_allowed_peel != (unsigned)-1)
>              {
> @@ -2259,7 +2259,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>    do_versioning
>      = (optimize_loop_nest_for_speed_p (loop)
>         && !loop->inner /* FORNOW */
> -       && flag_vect_cost_model != VECT_COST_MODEL_CHEAP);
> +       && flag_vect_cost_model > VECT_COST_MODEL_CHEAP);
>
>    if (do_versioning)
>      {
> @@ -3682,6 +3682,10 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
> loop_vinfo)
>    unsigned int count = (comp_alias_ddrs.length ()
>                         + check_unequal_addrs.length ());
>
> +  if (count && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP)
> +    return opt_result::failure_at
> +      (vect_location, "would need a runtime alias check\n");
> +
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
>                      "improved number of alias checks from %d to %d\n",
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 856bbfebf7c..48dfb4df00e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1827,6 +1827,19 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo)
>         }
>      }
>
> +  /* If using the "very cheap" model. reject cases in which we'd keep
> +     a copy of the scalar code (even if we might be able to vectorize it).  
> */
> +  if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
> +      && (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +         || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +         || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "some scalar iterations would need to be peeled\n");
> +      return 0;
> +    }
> +
>    int min_profitable_iters, min_profitable_estimate;
>    vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,
>                                       &min_profitable_estimate);
> @@ -1885,6 +1898,20 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo)
>        min_profitable_estimate = min_profitable_iters;
>      }
>
> +  /* If the vector loop needs multiple iterations to be beneficial then
> +     things are probably too close to call, and the conservative thing
> +     would be to stick with the scalar code.  */
> +  if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
> +      && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "one iteration of the vector loop would be"
> +                        " more expensive than the equivalent number of"
> +                        " iterations of the scalar loop\n");
> +      return 0;
> +    }
> +
>    HOST_WIDE_INT estimated_niter;
>
>    /* If we are vectorizing an epilogue then we know the maximum number of

Reply via email to