On Tue, 24 Aug 2021, Richard Biener wrote: > On Tue, 24 Aug 2021, Kewen.Lin wrote: > > > Hi Richi, > > > > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote: > > > This removes --param vect-inner-loop-cost-factor in favor of looking > > > at the estimated number of iterations of the inner loop > > > when available and otherwise just assumes a single inner > > > iteration which is conservative on the side of not vectorizing. > > > > > > > I may miss something, the factor seems to be an amplifier, a single > > inner iteration on the side of not vectorizing only relies on that > > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction > > will be flipped? ({vector,scalar}_cost is only for inner loop part). > > > > Since we don't calculate/compare costing for inner loop independently > > and early return if scalar_cost < vector_cost for inner loop, I guess > > it's possible to have "scalar_cost < vector_cost" case theoretically, > > especially when targets can cost something more on vector side. > > True. > > > > The alternative is to retain the --param for exactly that case, > > > not sure if the result is better or not. The --param is new on > > > head, it was static '50' before. > > > > > > > I think the intention of --param is to offer ports a way to tweak > > it (no ports do it for now though :)). Not sure how target costing > > is sensitive to this factor, but I also prefer to make its default > > value as 50 as Honza suggested to avoid more possible tweakings. > > > > If targets want more, maybe we can extend it to: > > > > default_hook: > > return estimated or likely_max if either is valid; > > return default value; > > > > target hook: > > val = default_hook; // or from scratch > > tweak the val as it wishes; > > > > I guess there is no this need for now. > > > > > Any strong opinions? > > > > > > Richard. > > > > > > 2021-08-23 Richard Biener <rguent...@suse.de> > > > > > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > > > documentation. > > > * params.opt (--param vect-inner-loop-cost-factor): Remove. > > > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > > > Initialize inner_loop_cost_factor to 1. > > > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > > > from the estimated number of iterations of the inner loop. > > > --- > > > gcc/doc/invoke.texi | 5 ----- > > > gcc/params.opt | 4 ---- > > > gcc/tree-vect-loop.c | 12 +++++++++++- > > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > index c057cc1e4ae..054950132f6 100644 > > > --- a/gcc/doc/invoke.texi > > > +++ b/gcc/doc/invoke.texi > > > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads > > > and stores in all loops. > > > The parameter only has an effect on targets that support partial > > > vector loads and stores. > > > > > > -@item vect-inner-loop-cost-factor > > > -The factor which the loop vectorizer applies to the cost of statements > > > -in an inner loop relative to the loop being vectorized. The default > > > -value is 50. > > > - > > > @item avoid-fma-max-bits > > > Maximum number of bits for which we avoid creating FMAs. > > > > > > diff --git a/gcc/params.opt b/gcc/params.opt > > > index f9264887b40..f7b19fa430d 100644 > > > --- a/gcc/params.opt > > > +++ b/gcc/params.opt > > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the > > > vectorizer's loop versioning f > > > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) > > > IntegerRange(0, 2) Param Optimization > > > Controls how loop vectorizer uses partial vectors. 0 means never, 1 > > > means only for loops whose need to iterate can be removed, 2 means for > > > all loops. The default value is 2. > > > > > > --param=vect-inner-loop-cost-factor= > > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) > > > IntegerRange(1, 999999) Param Optimization > > > -The factor which the loop vectorizer applies to the cost of statements > > > in an inner loop relative to the loop being vectorized. > > > - > > > ; This comment is to ensure we retain the blank line above. > > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > > > index c521b43a47c..cb48717f20e 100644 > > > --- a/gcc/tree-vect-loop.c > > > +++ b/gcc/tree-vect-loop.c > > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, > > > vec_info_shared *shared) > > > single_scalar_iteration_cost (0), > > > vec_outside_cost (0), > > > vec_inside_cost (0), > > > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > > > + inner_loop_cost_factor (1), > > > vectorizable (false), > > > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > > > using_partial_vectors_p (false), > > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, > > > vec_info_shared *shared) > > > stmt_vec_info inner_loop_cond_info > > > = loop_vinfo->lookup_stmt (inner_loop_cond); > > > STMT_VINFO_TYPE (inner_loop_cond_info) = > > > loop_exit_ctrl_vec_info_type; > > > + /* If we have an estimate on the number of iterations of the inner > > > + loop use that as the scale for costing, otherwise conservatively > > > + assume a single inner iteration. */ > > > + widest_int nit; > > > + if (get_estimated_loop_iterations (loop->inner, &nit)) > > > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > > > + /* Since costing is done on unsigned int cap the scale on > > > + some large number consistent with what we'd see in > > > + CFG counts. */ > > > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); > > > > I noticed loop-doloop.c use _int version and likely_max, maybe you want > > that here? > > > > est_niter = get_estimated_loop_iterations_int (loop); > > if (est_niter == -1) > > est_niter = get_likely_max_loop_iterations_int (loop) > > I think that are two different things - get_estimated_loop_iterations_int > are the average number of iterations while > get_likely_max_loop_iterations_int is an upper bound. I'm not sure we > want to use an upper bound for costing. > > Based on feedback from Honza I'm currently testing the variant below > which keeps the --param and uses it to cap the estimated number of > iterations. That makes the scaling more precise for inner loops that > don't iterate much but keeps the --param to avoid overflow and to > keep the present behavior when there's no reliable profile info > available.
And Kewen noticed the off-by-one by using get_estimated_loop_iterations so I pushed the correct variant using estimated_stmt_executions. Bootstrapped / tested on x86_64-unknown-linux-gnu. Richard.