On Thu, Nov 19, 2020 at 1:04 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Mon, Nov 16, 2020 at 10:58 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > Does the patch also vectorize with SVE loops that have > >> > unknown loop bound? The documentation isn't entirely > >> > conclusive there. > >> > >> Yeah, for SVE it vectorises. How about changing: > >> > >> For example, if each iteration of a vectorized loop would handle > >> exactly four iterations, … > >> > >> to: > >> > >> For example, if each iteration of a vectorized loop could only > >> handle exactly four iterations of the original scalar loop, … > >> > >> ? > > > > Yeah, guess that's better. > > > >> > >> > Iff the iteration count is a multiple of two and the target can > >> > vectorize the loop with both VF 2 and VF 4 but VF 4 would be better if > >> > we'd use the 'cheap' cost model, does 'very-cheap' not vectorize the > >> > loop or does it choose VF 2? > >> > >> It would choose VF 2, if that's still a win over scalar code. > > > > OK, that's what I expected. The VF iteration is one source of > > compile-time that we might want to avoid somehow ... on > > x86_64 knowing the precise number of constant iterations > > should allow to only pick a subset of vector modes based on > > largest_pow2_factor or so? Or maybe just use the preferred > > SIMD mode for cheap/very-cheap? (maybe pass down > > the cost model kind to the target hook so targets can decide > > for themselves here) > > On the preferred simd mode thing: TBH, I'd prefer to get rid > of that hook one day and just rely on autovectorize_vector_modes. > > The difficulty with adding an early check is that we don't know ahead > of time which types of scalar element a loop operates on: we only find > that out on the fly during the first analysis of the loop. The check > would also depend on SLP grouping: we can use a vector of 4 ints to > handle 2 iterations of the scalar loop if the ints are in an SLP > group of size 2. > > So I agree it would be nice to have early-outs, but I think we'd have > to restructure things first. E.g. maybe we could do some “cheap” initial > analysis that checks for basic vectorisability, records which scalar > elements are used by the loop, and records how big the containing SLP > groups might be (based on optimistic assumptions). Then we can use > that to prefilter the modes we try (perhaps all the way down to no modes). > I guess that's conceptually similar to building an SLP graph though. > > Does the attached look OK? I've included a version of the updated > wording above. I also changed this condition to use “>” rather > than “>=”:
Yes, OK. Thanks, Richard. > > /* If the vector loop needs multiple iterations to be beneficial then > things are probably too close to call, and the conservative thing > would be to stick with the scalar code. */ > if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP > && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo)) > > since when min_profitable_estimate == min_profitable_iters > we'll have done: > > if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > && min_profitable_iters < (assumed_vf + peel_iters_prologue)) > /* We want the vectorized loop to execute at least once. */ > min_profitable_iters = assumed_vf + peel_iters_prologue; > > I also tried to make vect-cost-model-4.c more resilient on targets > that require alignment. > > Thanks, > Richard > > > gcc/ > * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model. > * common.opt (fvect-cost-model=): Add very-cheap as a possible option. > (fsimd-cost-model=): Likewise. > (vect_cost_model): Add very-cheap. > * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP. > Put the values in order of increasing aggressiveness. > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use > range checks when comparing against VECT_COST_MODEL_CHEAP. > (vect_prune_runtime_alias_test_list): Do not allow any alias > checks for the very-cheap cost model. > * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow > any peeling for the very-cheap cost model. Also require one > iteration of the vector loop to pay for itself. > > gcc/testsuite/ > * gcc.dg/vect/vect-cost-model-1.c: New test. > * gcc.dg/vect/vect-cost-model-2.c: Likewise. > * gcc.dg/vect/vect-cost-model-3.c: Likewise. > * gcc.dg/vect/vect-cost-model-4.c: Likewise. > * gcc.dg/vect/vect-cost-model-5.c: Likewise. > * gcc.dg/vect/vect-cost-model-6.c: Likewise. > --- > gcc/common.opt | 7 +++-- > gcc/doc/invoke.texi | 12 +++++++-- > gcc/flag-types.h | 10 ++++--- > gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c | 11 ++++++++ > gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c | 11 ++++++++ > gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c | 11 ++++++++ > gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c | 13 +++++++++ > gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c | 11 ++++++++ > gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c | 12 +++++++++ > gcc/tree-vect-data-refs.c | 8 ++++-- > gcc/tree-vect-loop.c | 27 +++++++++++++++++++ > 11 files changed, 123 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c > > diff --git a/gcc/common.opt b/gcc/common.opt > index fe39b3dee9f..ca8a2690799 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -3020,11 +3020,11 @@ Enable basic block vectorization (SLP) on trees. > > fvect-cost-model= > Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) > Init(VECT_COST_MODEL_DEFAULT) Optimization > --fvect-cost-model=[unlimited|dynamic|cheap] Specifies the cost model for > vectorization. > +-fvect-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the cost > model for vectorization. > > fsimd-cost-model= > Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) > Init(VECT_COST_MODEL_UNLIMITED) Optimization > --fsimd-cost-model=[unlimited|dynamic|cheap] Specifies the vectorization > cost model for code marked with a simd directive. > +-fsimd-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the > vectorization cost model for code marked with a simd directive. > > Enum > Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown > vectorizer cost model %qs) > @@ -3038,6 +3038,9 @@ Enum(vect_cost_model) String(dynamic) > Value(VECT_COST_MODEL_DYNAMIC) > EnumValue > Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP) > > +EnumValue > +Enum(vect_cost_model) String(very-cheap) Value(VECT_COST_MODEL_VERY_CHEAP) > + > fvect-cost-model > Common Alias(fvect-cost-model=,dynamic,unlimited) > Enables the dynamic vectorizer cost model. Preserved for backward > compatibility. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 3510a54c6c4..07232c6b33d 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11440,7 +11440,8 @@ and @option{-fauto-profile}. > @item -fvect-cost-model=@var{model} > @opindex fvect-cost-model > Alter the cost model used for vectorization. The @var{model} argument > -should be one of @samp{unlimited}, @samp{dynamic} or @samp{cheap}. > +should be one of @samp{unlimited}, @samp{dynamic}, @samp{cheap} or > +@samp{very-cheap}. > With the @samp{unlimited} model the vectorized code-path is assumed > to be profitable while with the @samp{dynamic} model a runtime check > guards the vectorized code-path to enable it only for iteration > @@ -11448,7 +11449,14 @@ counts that will likely execute faster than when > executing the original > scalar loop. The @samp{cheap} model disables vectorization of > loops where doing so would be cost prohibitive for example due to > required runtime checks for data dependence or alignment but otherwise > -is equal to the @samp{dynamic} model. > +is equal to the @samp{dynamic} model. The @samp{very-cheap} model only > +allows vectorization if the vector code would entirely replace the > +scalar code that is being vectorized. For example, if each iteration > +of a vectorized loop would only be able to handle exactly four iterations > +of the scalar loop, the @samp{very-cheap} model would only allow > +vectorization if the scalar iteration count is known to be a multiple > +of four. > + > The default cost model depends on other optimization flags and is > either @samp{dynamic} or @samp{cheap}. > > diff --git a/gcc/flag-types.h b/gcc/flag-types.h > index 648ed096e30..0dbab19943c 100644 > --- a/gcc/flag-types.h > +++ b/gcc/flag-types.h > @@ -232,12 +232,14 @@ enum scalar_storage_order_kind { > SSO_LITTLE_ENDIAN > }; > > -/* Vectorizer cost-model. */ > +/* Vectorizer cost-model. Except for DEFAULT, the values are ordered from > + the most conservative to the least conservative. */ > enum vect_cost_model { > + VECT_COST_MODEL_VERY_CHEAP = -3, > + VECT_COST_MODEL_CHEAP = -2, > + VECT_COST_MODEL_DYNAMIC = -1, > VECT_COST_MODEL_UNLIMITED = 0, > - VECT_COST_MODEL_CHEAP = 1, > - VECT_COST_MODEL_DYNAMIC = 2, > - VECT_COST_MODEL_DEFAULT = 3 > + VECT_COST_MODEL_DEFAULT = 1 > }; > > /* Different instrumentation modes. */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c > new file mode 100644 > index 00000000000..0737da5d671 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } > */ > + > +void > +f (int *x, int *y) > +{ > + for (unsigned int i = 0; i < 1024; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } > } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c > new file mode 100644 > index 00000000000..fa9bdb607b2 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize > -fvect-cost-model=very-cheap" } */ > + > +void > +f (int *x, int *y) > +{ > + for (unsigned int i = 0; i < 1024; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump-not {LOOP VECTORIZED} vect { target vect_int > } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c > new file mode 100644 > index 00000000000..d7c6cfd2049 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } > */ > + > +void > +f (int *restrict x, int *restrict y) > +{ > + for (unsigned int i = 0; i < 1024; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } > } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c > new file mode 100644 > index 00000000000..bb018ad99fe > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize > -fvect-cost-model=very-cheap" } */ > + > +int x[1024], y[1024]; > + > +void > +f (void) > +{ > + for (unsigned int i = 0; i < 1024; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } > } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c > new file mode 100644 > index 00000000000..536ec0a3cda > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize -fvect-cost-model=cheap" } > */ > + > +void > +f (int *restrict x, int *restrict y) > +{ > + for (unsigned int i = 0; i < 1023; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target vect_int } } > } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c > b/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c > new file mode 100644 > index 00000000000..552febb5fee > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize > -fvect-cost-model=very-cheap" } */ > + > +void > +f (int *restrict x, int *restrict y) > +{ > + for (unsigned int i = 0; i < 1023; ++i) > + x[i] += y[i]; > +} > + > +/* { dg-final { scan-tree-dump {LOOP VECTORIZED} vect { target { vect_int && > vect_partial_vectors_usage_2 } } } } */ > +/* { dg-final { scan-tree-dump-not {LOOP VECTORIZED} vect { target { > vect_int && { ! vect_partial_vectors_usage_2 } } } } } */ > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index 0efab495407..18e36c89d14 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -2161,7 +2161,7 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) > { > unsigned max_allowed_peel > = param_vect_max_peeling_for_alignment; > - if (flag_vect_cost_model == VECT_COST_MODEL_CHEAP) > + if (flag_vect_cost_model <= VECT_COST_MODEL_CHEAP) > max_allowed_peel = 0; > if (max_allowed_peel != (unsigned)-1) > { > @@ -2259,7 +2259,7 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) > do_versioning > = (optimize_loop_nest_for_speed_p (loop) > && !loop->inner /* FORNOW */ > - && flag_vect_cost_model != VECT_COST_MODEL_CHEAP); > + && flag_vect_cost_model > VECT_COST_MODEL_CHEAP); > > if (do_versioning) > { > @@ -3682,6 +3682,10 @@ vect_prune_runtime_alias_test_list (loop_vec_info > loop_vinfo) > unsigned int count = (comp_alias_ddrs.length () > + check_unequal_addrs.length ()); > > + if (count && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP) > + return opt_result::failure_at > + (vect_location, "would need a runtime alias check\n"); > + > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "improved number of alias checks from %d to %d\n", > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 856bbfebf7c..48dfb4df00e 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -1827,6 +1827,19 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo) > } > } > > + /* If using the "very cheap" model. reject cases in which we'd keep > + a copy of the scalar code (even if we might be able to vectorize it). > */ > + if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP > + && (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > + || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > + || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "some scalar iterations would need to be peeled\n"); > + return 0; > + } > + > int min_profitable_iters, min_profitable_estimate; > vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters, > &min_profitable_estimate); > @@ -1885,6 +1898,20 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo) > min_profitable_estimate = min_profitable_iters; > } > > + /* If the vector loop needs multiple iterations to be beneficial then > + things are probably too close to call, and the conservative thing > + would be to stick with the scalar code. */ > + if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP > + && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "one iteration of the vector loop would be" > + " more expensive than the equivalent number of" > + " iterations of the scalar loop\n"); > + return 0; > + } > + > HOST_WIDE_INT estimated_niter; > > /* If we are vectorizing an epilogue then we know the maximum number of