On Tue, 24 Aug 2021, Richard Biener wrote:

> On Tue, 24 Aug 2021, Kewen.Lin wrote:
> 
> > Hi Richi,
> > 
> > on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote:
> > > This removes --param vect-inner-loop-cost-factor in favor of looking
> > > at the estimated number of iterations of the inner loop
> > > when available and otherwise just assumes a single inner
> > > iteration which is conservative on the side of not vectorizing.
> > > 
> > 
> > I may miss something, the factor seems to be an amplifier, a single
> > inner iteration on the side of not vectorizing only relies on that
> > vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
> > will be flipped? ({vector,scalar}_cost is only for inner loop part).
> > 
> > Since we don't calculate/compare costing for inner loop independently
> > and early return if scalar_cost < vector_cost for inner loop, I guess
> > it's possible to have "scalar_cost < vector_cost" case theoretically,
> > especially when targets can cost something more on vector side.
> 
> True.
> 
> > > The alternative is to retain the --param for exactly that case,
> > > not sure if the result is better or not.  The --param is new on
> > > head, it was static '50' before.
> > > 
> > 
> > I think the intention of --param is to offer ports a way to tweak
> > it (no ports do it for now though :)).  Not sure how target costing
> > is sensitive to this factor, but I also prefer to make its default
> > value as 50 as Honza suggested to avoid more possible tweakings.
> > 
> > If targets want more, maybe we can extend it to:
> > 
> > default_hook:
> >   return estimated or likely_max if either is valid;
> >   return default value;
> >   
> > target hook:
> >   val = default_hook; // or from scratch
> >   tweak the val as it wishes;  
> > 
> > I guess there is no this need for now.
> >
> > > Any strong opinions?
> > > 
> > > Richard.
> > > 
> > > 2021-08-23  Richard Biener  <rguent...@suse.de>
> > > 
> > >   * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> > >   documentation.
> > >   * params.opt (--param vect-inner-loop-cost-factor): Remove.
> > >   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> > >   Initialize inner_loop_cost_factor to 1.
> > >   (vect_analyze_loop_form): Initialize inner_loop_cost_factor
> > >   from the estimated number of iterations of the inner loop.
> > > ---
> > >  gcc/doc/invoke.texi  |  5 -----
> > >  gcc/params.opt       |  4 ----
> > >  gcc/tree-vect-loop.c | 12 +++++++++++-
> > >  3 files changed, 11 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index c057cc1e4ae..054950132f6 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads 
> > > and stores in all loops.
> > >  The parameter only has an effect on targets that support partial
> > >  vector loads and stores.
> > >  
> > > -@item vect-inner-loop-cost-factor
> > > -The factor which the loop vectorizer applies to the cost of statements
> > > -in an inner loop relative to the loop being vectorized.  The default
> > > -value is 50.
> > > -
> > >  @item avoid-fma-max-bits
> > >  Maximum number of bits for which we avoid creating FMAs.
> > >  
> > > diff --git a/gcc/params.opt b/gcc/params.opt
> > > index f9264887b40..f7b19fa430d 100644
> > > --- a/gcc/params.opt
> > > +++ b/gcc/params.opt
> > > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the 
> > > vectorizer's loop versioning f
> > >  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) 
> > > IntegerRange(0, 2) Param Optimization
> > >  Controls how loop vectorizer uses partial vectors.  0 means never, 1 
> > > means only for loops whose need to iterate can be removed, 2 means for 
> > > all loops.  The default value is 2.
> > >  
> > > --param=vect-inner-loop-cost-factor=
> > > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
> > > IntegerRange(1, 999999) Param Optimization
> > > -The factor which the loop vectorizer applies to the cost of statements 
> > > in an inner loop relative to the loop being vectorized.
> > > -
> > >  ; This comment is to ensure we retain the blank line above.
> > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > > index c521b43a47c..cb48717f20e 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> > > vec_info_shared *shared)
> > >      single_scalar_iteration_cost (0),
> > >      vec_outside_cost (0),
> > >      vec_inside_cost (0),
> > > -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> > > +    inner_loop_cost_factor (1),
> > >      vectorizable (false),
> > >      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> > >      using_partial_vectors_p (false),
> > > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, 
> > > vec_info_shared *shared)
> > >        stmt_vec_info inner_loop_cond_info
> > >   = loop_vinfo->lookup_stmt (inner_loop_cond);
> > >        STMT_VINFO_TYPE (inner_loop_cond_info) = 
> > > loop_exit_ctrl_vec_info_type;
> > > +      /* If we have an estimate on the number of iterations of the inner
> > > +  loop use that as the scale for costing, otherwise conservatively
> > > +  assume a single inner iteration.  */
> > > +      widest_int nit;
> > > +      if (get_estimated_loop_iterations (loop->inner, &nit))
> > > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> > > +   /* Since costing is done on unsigned int cap the scale on
> > > +      some large number consistent with what we'd see in
> > > +      CFG counts.  */
> > > +   = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
> > 
> > I noticed loop-doloop.c use _int version and likely_max, maybe you want 
> > that here?
> >  
> >   est_niter = get_estimated_loop_iterations_int (loop);
> >   if (est_niter == -1)
> >     est_niter = get_likely_max_loop_iterations_int (loop)
> 
> I think that are two different things - get_estimated_loop_iterations_int
> are the average number of iterations while 
> get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
> want to use an upper bound for costing.
> 
> Based on feedback from Honza I'm currently testing the variant below
> which keeps the --param and uses it to cap the estimated number of
> iterations.  That makes the scaling more precise for inner loops that
> don't iterate much but keeps the --param to avoid overflow and to
> keep the present behavior when there's no reliable profile info
> available.

And Kewen noticed the off-by-one by using get_estimated_loop_iterations
so I pushed the correct variant using estimated_stmt_executions.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

Reply via email to