On Tue, 24 Aug 2021, Kewen.Lin wrote: > Hi Richi, > > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote: > > This removes --param vect-inner-loop-cost-factor in favor of looking > > at the estimated number of iterations of the inner loop > > when available and otherwise just assumes a single inner > > iteration which is conservative on the side of not vectorizing. > > > > I may miss something, the factor seems to be an amplifier, a single > inner iteration on the side of not vectorizing only relies on that > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction > will be flipped? ({vector,scalar}_cost is only for inner loop part). > > Since we don't calculate/compare costing for inner loop independently > and early return if scalar_cost < vector_cost for inner loop, I guess > it's possible to have "scalar_cost < vector_cost" case theoretically, > especially when targets can cost something more on vector side.
True. > > The alternative is to retain the --param for exactly that case, > > not sure if the result is better or not. The --param is new on > > head, it was static '50' before. > > > > I think the intention of --param is to offer ports a way to tweak > it (no ports do it for now though :)). Not sure how target costing > is sensitive to this factor, but I also prefer to make its default > value as 50 as Honza suggested to avoid more possible tweakings. > > If targets want more, maybe we can extend it to: > > default_hook: > return estimated or likely_max if either is valid; > return default value; > > target hook: > val = default_hook; // or from scratch > tweak the val as it wishes; > > I guess there is no this need for now. > > > Any strong opinions? > > > > Richard. > > > > 2021-08-23 Richard Biener <rguent...@suse.de> > > > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > > documentation. > > * params.opt (--param vect-inner-loop-cost-factor): Remove. > > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > > Initialize inner_loop_cost_factor to 1. > > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > > from the estimated number of iterations of the inner loop. > > --- > > gcc/doc/invoke.texi | 5 ----- > > gcc/params.opt | 4 ---- > > gcc/tree-vect-loop.c | 12 +++++++++++- > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index c057cc1e4ae..054950132f6 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads > > and stores in all loops. > > The parameter only has an effect on targets that support partial > > vector loads and stores. > > > > -@item vect-inner-loop-cost-factor > > -The factor which the loop vectorizer applies to the cost of statements > > -in an inner loop relative to the loop being vectorized. The default > > -value is 50. > > - > > @item avoid-fma-max-bits > > Maximum number of bits for which we avoid creating FMAs. > > > > diff --git a/gcc/params.opt b/gcc/params.opt > > index f9264887b40..f7b19fa430d 100644 > > --- a/gcc/params.opt > > +++ b/gcc/params.opt > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the > > vectorizer's loop versioning f > > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) > > IntegerRange(0, 2) Param Optimization > > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means > > only for loops whose need to iterate can be removed, 2 means for all loops. > > The default value is 2. > > > > --param=vect-inner-loop-cost-factor= > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) > > IntegerRange(1, 999999) Param Optimization > > -The factor which the loop vectorizer applies to the cost of statements in > > an inner loop relative to the loop being vectorized. > > - > > ; This comment is to ensure we retain the blank line above. > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > index c521b43a47c..cb48717f20e 100644 > > --- a/gcc/tree-vect-loop.c > > +++ b/gcc/tree-vect-loop.c > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, > > vec_info_shared *shared) > > single_scalar_iteration_cost (0), > > vec_outside_cost (0), > > vec_inside_cost (0), > > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > > + inner_loop_cost_factor (1), > > vectorizable (false), > > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > > using_partial_vectors_p (false), > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, > > vec_info_shared *shared) > > stmt_vec_info inner_loop_cond_info > > = loop_vinfo->lookup_stmt (inner_loop_cond); > > STMT_VINFO_TYPE (inner_loop_cond_info) = > > loop_exit_ctrl_vec_info_type; > > + /* If we have an estimate on the number of iterations of the inner > > + loop use that as the scale for costing, otherwise conservatively > > + assume a single inner iteration. */ > > + widest_int nit; > > + if (get_estimated_loop_iterations (loop->inner, &nit)) > > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > > + /* Since costing is done on unsigned int cap the scale on > > + some large number consistent with what we'd see in > > + CFG counts. */ > > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); > > I noticed loop-doloop.c use _int version and likely_max, maybe you want that > here? > > est_niter = get_estimated_loop_iterations_int (loop); > if (est_niter == -1) > est_niter = get_likely_max_loop_iterations_int (loop) I think that are two different things - get_estimated_loop_iterations_int are the average number of iterations while get_likely_max_loop_iterations_int is an upper bound. I'm not sure we want to use an upper bound for costing. Based on feedback from Honza I'm currently testing the variant below which keeps the --param and uses it to cap the estimated number of iterations. That makes the scaling more precise for inner loops that don't iterate much but keeps the --param to avoid overflow and to keep the present behavior when there's no reliable profile info available. Richard. >From d140a56b6ef95555bbfe6d228a25b1d021c7d796 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Mon, 23 Aug 2021 14:15:14 +0200 Subject: [PATCH] Adjust inner loop cost scaling To: gcc-patches@gcc.gnu.org This makes use of the estimated number of iterations of the inner loop to limit --param vect-inner-loop-cost-factor scaling. It also reduces the maximum value of vect-inner-loop-cost-factor to 10000 making it less likely to cause overflow of costs. 2021-08-23 Richard Biener <rguent...@suse.de> * doc/invoke.texi (vect-inner-loop-cost-factor): Adjust. * params.opt (--param vect-inner-loop-cost-factor): Adjust maximum value. * tree-vect-loop.c (vect_analyze_loop_form): Initialize inner_loop_cost_factor to the minimum of the estimated number of iterations of the inner loop and vect-inner-loop-cost-factor. --- gcc/doc/invoke.texi | 7 ++++--- gcc/params.opt | 4 ++-- gcc/tree-vect-loop.c | 7 +++++++ 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c057cc1e4ae..a9d56fecf4e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14386,9 +14386,10 @@ The parameter only has an effect on targets that support partial vector loads and stores. @item vect-inner-loop-cost-factor -The factor which the loop vectorizer applies to the cost of statements -in an inner loop relative to the loop being vectorized. The default -value is 50. +The maximum factor which the loop vectorizer applies to the cost of statements +in an inner loop relative to the loop being vectorized. The factor applied +is the maximum of the estimated number of iterations of the inner loop and +this parameter. The default value of this parameter is 50. @item avoid-fma-max-bits Maximum number of bits for which we avoid creating FMAs. diff --git a/gcc/params.opt b/gcc/params.opt index f9264887b40..f414dc1a61c 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1114,7 +1114,7 @@ Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops. The default value is 2. -param=vect-inner-loop-cost-factor= -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization -The factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. +Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 10000) Param Optimization +The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. ; This comment is to ensure we retain the blank line above. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c521b43a47c..cbdd5b407da 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, vec_info_shared *shared) stmt_vec_info inner_loop_cond_info = loop_vinfo->lookup_stmt (inner_loop_cond); STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; + /* If we have an estimate on the number of iterations of the inner + loop use that to limit the scale for costing, otherwise use + --param vect-inner-loop-cost-factor literally. */ + widest_int nit; + if (get_estimated_loop_iterations (loop->inner, &nit)) + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) + = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi (); } gcc_assert (!loop->aux); -- 2.31.1