On Tue, 24 Aug 2021, Kewen.Lin wrote:

> Hi Richi,
> 
> on 2021/8/23 ??10:33, Richard Biener via Gcc-patches wrote:
> > This removes --param vect-inner-loop-cost-factor in favor of looking
> > at the estimated number of iterations of the inner loop
> > when available and otherwise just assumes a single inner
> > iteration which is conservative on the side of not vectorizing.
> > 
> 
> I may miss something, the factor seems to be an amplifier, a single
> inner iteration on the side of not vectorizing only relies on that
> vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
> will be flipped? ({vector,scalar}_cost is only for inner loop part).
> 
> Since we don't calculate/compare costing for inner loop independently
> and early return if scalar_cost < vector_cost for inner loop, I guess
> it's possible to have "scalar_cost < vector_cost" case theoretically,
> especially when targets can cost something more on vector side.

True.

> > The alternative is to retain the --param for exactly that case,
> > not sure if the result is better or not.  The --param is new on
> > head, it was static '50' before.
> > 
> 
> I think the intention of --param is to offer ports a way to tweak
> it (no ports do it for now though :)).  Not sure how target costing
> is sensitive to this factor, but I also prefer to make its default
> value as 50 as Honza suggested to avoid more possible tweakings.
> 
> If targets want more, maybe we can extend it to:
> 
> default_hook:
>   return estimated or likely_max if either is valid;
>   return default value;
>   
> target hook:
>   val = default_hook; // or from scratch
>   tweak the val as it wishes;  
> 
> I guess there is no this need for now.
>
> > Any strong opinions?
> > 
> > Richard.
> > 
> > 2021-08-23  Richard Biener  <rguent...@suse.de>
> > 
> >     * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> >     documentation.
> >     * params.opt (--param vect-inner-loop-cost-factor): Remove.
> >     * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> >     Initialize inner_loop_cost_factor to 1.
> >     (vect_analyze_loop_form): Initialize inner_loop_cost_factor
> >     from the estimated number of iterations of the inner loop.
> > ---
> >  gcc/doc/invoke.texi  |  5 -----
> >  gcc/params.opt       |  4 ----
> >  gcc/tree-vect-loop.c | 12 +++++++++++-
> >  3 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index c057cc1e4ae..054950132f6 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads 
> > and stores in all loops.
> >  The parameter only has an effect on targets that support partial
> >  vector loads and stores.
> >  
> > -@item vect-inner-loop-cost-factor
> > -The factor which the loop vectorizer applies to the cost of statements
> > -in an inner loop relative to the loop being vectorized.  The default
> > -value is 50.
> > -
> >  @item avoid-fma-max-bits
> >  Maximum number of bits for which we avoid creating FMAs.
> >  
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index f9264887b40..f7b19fa430d 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the 
> > vectorizer's loop versioning f
> >  Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) 
> > IntegerRange(0, 2) Param Optimization
> >  Controls how loop vectorizer uses partial vectors.  0 means never, 1 means 
> > only for loops whose need to iterate can be removed, 2 means for all loops. 
> >  The default value is 2.
> >  
> > --param=vect-inner-loop-cost-factor=
> > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
> > IntegerRange(1, 999999) Param Optimization
> > -The factor which the loop vectorizer applies to the cost of statements in 
> > an inner loop relative to the loop being vectorized.
> > -
> >  ; This comment is to ensure we retain the blank line above.
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index c521b43a47c..cb48717f20e 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> > vec_info_shared *shared)
> >      single_scalar_iteration_cost (0),
> >      vec_outside_cost (0),
> >      vec_inside_cost (0),
> > -    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> > +    inner_loop_cost_factor (1),
> >      vectorizable (false),
> >      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> >      using_partial_vectors_p (false),
> > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, 
> > vec_info_shared *shared)
> >        stmt_vec_info inner_loop_cond_info
> >     = loop_vinfo->lookup_stmt (inner_loop_cond);
> >        STMT_VINFO_TYPE (inner_loop_cond_info) = 
> > loop_exit_ctrl_vec_info_type;
> > +      /* If we have an estimate on the number of iterations of the inner
> > +    loop use that as the scale for costing, otherwise conservatively
> > +    assume a single inner iteration.  */
> > +      widest_int nit;
> > +      if (get_estimated_loop_iterations (loop->inner, &nit))
> > +   LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> > +     /* Since costing is done on unsigned int cap the scale on
> > +        some large number consistent with what we'd see in
> > +        CFG counts.  */
> > +     = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
> 
> I noticed loop-doloop.c use _int version and likely_max, maybe you want that 
> here?
>  
>   est_niter = get_estimated_loop_iterations_int (loop);
>   if (est_niter == -1)
>     est_niter = get_likely_max_loop_iterations_int (loop)

I think that are two different things - get_estimated_loop_iterations_int
are the average number of iterations while 
get_likely_max_loop_iterations_int is an upper bound.  I'm not sure we
want to use an upper bound for costing.

Based on feedback from Honza I'm currently testing the variant below
which keeps the --param and uses it to cap the estimated number of
iterations.  That makes the scaling more precise for inner loops that
don't iterate much but keeps the --param to avoid overflow and to
keep the present behavior when there's no reliable profile info
available.

Richard.

>From d140a56b6ef95555bbfe6d228a25b1d021c7d796 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Mon, 23 Aug 2021 14:15:14 +0200
Subject: [PATCH] Adjust inner loop cost scaling
To: gcc-patches@gcc.gnu.org

This makes use of the estimated number of iterations of the inner loop
to limit --param vect-inner-loop-cost-factor scaling.  It also reduces
the maximum value of vect-inner-loop-cost-factor to 10000 making it
less likely to cause overflow of costs.

2021-08-23  Richard Biener  <rguent...@suse.de>

        * doc/invoke.texi (vect-inner-loop-cost-factor): Adjust.
        * params.opt (--param vect-inner-loop-cost-factor): Adjust
        maximum value.
        * tree-vect-loop.c (vect_analyze_loop_form): Initialize
        inner_loop_cost_factor to the minimum of the estimated number
        of iterations of the inner loop and vect-inner-loop-cost-factor.
---
 gcc/doc/invoke.texi  | 7 ++++---
 gcc/params.opt       | 4 ++--
 gcc/tree-vect-loop.c | 7 +++++++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c057cc1e4ae..a9d56fecf4e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14386,9 +14386,10 @@ The parameter only has an effect on targets that 
support partial
 vector loads and stores.
 
 @item vect-inner-loop-cost-factor
-The factor which the loop vectorizer applies to the cost of statements
-in an inner loop relative to the loop being vectorized.  The default
-value is 50.
+The maximum factor which the loop vectorizer applies to the cost of statements
+in an inner loop relative to the loop being vectorized.  The factor applied
+is the maximum of the estimated number of iterations of the inner loop and
+this parameter.  The default value of this parameter is 50.
 
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
diff --git a/gcc/params.opt b/gcc/params.opt
index f9264887b40..f414dc1a61c 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1114,7 +1114,7 @@ Common Joined UInteger 
Var(param_vect_partial_vector_usage) Init(2) IntegerRange
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means 
only for loops whose need to iterate can be removed, 2 means for all loops.  
The default value is 2.
 
 -param=vect-inner-loop-cost-factor=
-Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 999999) Param Optimization
-The factor which the loop vectorizer applies to the cost of statements in an 
inner loop relative to the loop being vectorized.
+Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 10000) Param Optimization
+The maximum factor which the loop vectorizer applies to the cost of statements 
in an inner loop relative to the loop being vectorized.
 
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c521b43a47c..cbdd5b407da 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, 
vec_info_shared *shared)
       stmt_vec_info inner_loop_cond_info
        = loop_vinfo->lookup_stmt (inner_loop_cond);
       STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
+      /* If we have an estimate on the number of iterations of the inner
+        loop use that to limit the scale for costing, otherwise use
+        --param vect-inner-loop-cost-factor literally.  */
+      widest_int nit;
+      if (get_estimated_loop_iterations (loop->inner, &nit))
+       LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
+         = wi::smin (nit, param_vect_inner_loop_cost_factor).to_uhwi ();
     }
 
   gcc_assert (!loop->aux);
-- 
2.31.1

Reply via email to