Hi Richi,

Thanks for the comments!

on 2021/5/7 下午5:43, Richard Biener wrote:
> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> When I was investigating density_test heuristics, I noticed that
>> the current rs6000_density_test could be used for single scalar
>> iteration cost calculation, through the call trace:
>>   vect_compute_single_scalar_iteration_cost
>>     -> rs6000_finish_cost
>>          -> rs6000_density_test
>>
>> It looks unexpected as its desriptive function comments and Bill
>> helped to confirm this needs to be fixed (thanks!).
>>
>> So this patch is to check the passed data, if it's the same as
>> the one in loop_vinfo, it indicates it's working on vector version
>> cost calculation, otherwise just early return.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>>
>> Nothing remarkable was observed with SPEC2017 Power9 full run.
>>
>> Is it ok for trunk?
> 
> +  /* Only care about cost of vector version, so exclude scalar
> version here.  */
> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
> +    return;
> 
> Hmm, looks like a quite "random" test to me.  What about adding a
> parameter to finish_cost () (or init_cost?) indicating the cost kind?
> 

I originally wanted to change the hook interface, but noticed that
the finish_cost in function vect_estimate_min_profitable_iters is
the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
it looks enough to differentiate the scalar version costing or
vector version costing for loop.  Do you mean this observation/
assumption easy to be broken sometime later?

The attached patch to add one new parameter to indicate the
costing kind explicitly as you suggested.

Does it look better?

gcc/ChangeLog:

        * doc/tm.texi: Regenerated.
        * target.def (init_cost): Add new parameter costing_for_scalar.
        * targhooks.c (default_init_cost): Adjust for new parameter.
        * targhooks.h (default_init_cost): Likewise.
        * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
        (vect_compute_single_scalar_iteration_cost): Likewise.
        (vect_analyze_loop_2): Likewise.
        * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
        (vect_bb_vectorization_profitable_p): Likewise.
        * tree-vectorizer.h (init_cost): Likewise.
        * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
        * config/i386/i386.c (ix86_init_cost): Likewise.
        * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.  

> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> so not sure whether the context really matters.  That said,
> the density test looks "interesting" ... the intent was that finish_cost
> might look at gathered data from add_stmt, not that it looks at
> the GIMPLE IL ... so why are you not counting vector_stmt vs.
> scalar_stmt entries in vect_body and using that for this metric?
> 

Good to know the intention behind finish_cost, thanks!

I'm afraid that the check on vector_stmt and scalar_stmt entries
from add_stmt_cost doesn't work for the density test here.  The
density test focuses on the vector version itself, there are some
stmts whose relevants are marked as vect_unused_in_scope, IIUC
they won't be passed down when costing for both versions.  But the
existing density check would like to know the cost for the
non-vectorized part.  The current implementation does:
 
 vec_cost = data->cost[vect_body]

          if (!STMT_VINFO_RELEVANT_P (stmt_info)
              && !STMT_VINFO_IN_PATTERN_P (stmt_info))
            not_vec_cost++

 density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

it takes those unrelevant stmts into account, and then has
both costs from the non-vectorized part (not_vec_cost)
and vectorized part (cost[vect_body]), it can calculate the
vectorization code density ratio.

BR,
Kewen
 gcc/config/aarch64/aarch64.c | 2 +-
 gcc/config/i386/i386.c       | 2 +-
 gcc/config/rs6000/rs6000.c   | 2 +-
 gcc/doc/tm.texi              | 4 ++--
 gcc/target.def               | 6 ++++--
 gcc/targhooks.c              | 3 ++-
 gcc/targhooks.h              | 2 +-
 gcc/tree-vect-loop.c         | 6 +++---
 gcc/tree-vect-slp.c          | 8 +++++---
 gcc/tree-vectorizer.h        | 4 ++--
 10 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12625a4bee3..de3c86d85fb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14390,7 +14390,7 @@ struct aarch64_vector_costs
 
 /* Implement TARGET_VECTORIZE_INIT_COST.  */
 void *
-aarch64_init_cost (class loop *)
+aarch64_init_cost (class loop *, bool)
 {
   return new aarch64_vector_costs;
 }
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7c41302c75b..5078d94970a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22223,7 +22223,7 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
struct noce_if_info *if_info)
 /* Implement targetm.vectorize.init_cost.  */
 
 static void *
-ix86_init_cost (class loop *)
+ix86_init_cost (class loop *, bool)
 {
   unsigned *cost = XNEWVEC (unsigned, 3);
   cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e2ff00145c5..88f16b909a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5292,7 +5292,7 @@ rs6000_density_test (rs6000_cost_data *data)
 /* Implement targetm.vectorize.init_cost.  */
 
 static void *
-rs6000_init_cost (struct loop *loop_info)
+rs6000_init_cost (struct loop *loop_info, bool)
 {
   rs6000_cost_data *data = XNEW (struct _rs6000_cost_data);
   data->loop_info = loop_info;
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 823f85ba9ab..8b3af67782e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6111,8 +6111,8 @@ type @code{internal_fn}) should be considered expensive 
when the mask is
 all zeros.  GCC can then try to branch around the instruction instead.
 @end deftypefn
 
-@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (class loop 
*@var{loop_info})
-This hook should initialize target-specific data structures in preparation for 
modeling the costs of vectorizing a loop or basic block.  The default allocates 
three unsigned integers for accumulating costs for the prologue, body, and 
epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it 
identifies the loop being vectorized; otherwise a single block is being 
vectorized.
+@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (class loop 
*@var{loop_info}, bool @var{costing_for_scalar})
+This hook should initialize target-specific data structures in preparation for 
modeling the costs of vectorizing a loop or basic block.  The default allocates 
three unsigned integers for accumulating costs for the prologue, body, and 
epilogue of the loop or basic block.  If @var{loop_info} is non-NULL, it 
identifies the loop being vectorized; otherwise a single block is being 
vectorized.  If @var{costing_for_scalar} is true, it indicates the current cost 
model is for the scalar version of a loop or block; otherwise it is for the 
vector version.
 @end deftypefn
 
 @deftypefn {Target Hook} unsigned TARGET_VECTORIZE_ADD_STMT_COST (class 
vec_info *@var{}, void *@var{data}, int @var{count}, enum vect_cost_for_stmt 
@var{kind}, class _stmt_vec_info *@var{stmt_info}, tree @var{vectype}, int 
@var{misalign}, enum vect_cost_model_location @var{where})
diff --git a/gcc/target.def b/gcc/target.def
index d7b94bd8e5d..9407ed17f2d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2004,9 +2004,11 @@ DEFHOOK
  "allocates three unsigned integers for accumulating costs for the prologue, "
  "body, and epilogue of the loop or basic block.  If @var{loop_info} is "
  "non-NULL, it identifies the loop being vectorized; otherwise a single block "
- "is being vectorized.",
+ "is being vectorized.  If @var{costing_for_scalar} is true, it indicates the "
+ "current cost model is for the scalar version of a loop or block; otherwise "
+ "it is for the vector version.",
  void *,
- (class loop *loop_info),
+ (class loop *loop_info, bool costing_for_scalar),
  default_init_cost)
 
 /* Target function to record N statements of the given kind using the
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 952fad422eb..2e0fdb797e0 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1373,7 +1373,8 @@ default_empty_mask_is_expensive (unsigned ifn)
    array of three unsigned ints, set it to zero, and return its address.  */
 
 void *
-default_init_cost (class loop *loop_info ATTRIBUTE_UNUSED)
+default_init_cost (class loop *loop_info ATTRIBUTE_UNUSED,
+                  bool costing_for_scalar ATTRIBUTE_UNUSED)
 {
   unsigned *cost = XNEWVEC (unsigned, 3);
   cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 9928d064abd..b537038c0aa 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -117,7 +117,7 @@ extern opt_machine_mode default_vectorize_related_mode 
(machine_mode,
                                                        poly_uint64);
 extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
-extern void *default_init_cost (class loop *);
+extern void *default_init_cost (class loop *, bool);
 extern unsigned default_add_stmt_cost (class vec_info *, void *, int,
                                       enum vect_cost_for_stmt,
                                       class _stmt_vec_info *, tree, int,
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 2aba503fef7..f10e66a2465 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -813,7 +813,7 @@ bb_in_loop_p (const_basic_block bb, const void *data)
    stmt_vec_info structs for all the stmts in LOOP_IN.  */
 
 _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
-  : vec_info (vec_info::loop, init_cost (loop_in), shared),
+  : vec_info (vec_info::loop, init_cost (loop_in, false), shared),
     loop (loop_in),
     bbs (XCNEWVEC (basic_block, loop->num_nodes)),
     num_itersm1 (NULL_TREE),
@@ -1284,7 +1284,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info 
loop_vinfo)
     }
 
   /* Now accumulate cost.  */
-  void *target_cost_data = init_cost (loop);
+  void *target_cost_data = init_cost (loop, true);
   stmt_info_for_cost *si;
   int j;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
@@ -2723,7 +2723,7 @@ again:
   /* Reset target cost data.  */
   destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
   LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)
-    = init_cost (LOOP_VINFO_LOOP (loop_vinfo));
+    = init_cost (LOOP_VINFO_LOOP (loop_vinfo), false);
   /* Reset accumulated rgroup information.  */
   release_vec_loop_controls (&LOOP_VINFO_MASKS (loop_vinfo));
   release_vec_loop_controls (&LOOP_VINFO_LENS (loop_vinfo));
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1c5b7ae84e2..0ec92b0f0ca 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3690,7 +3690,9 @@ vect_detect_hybrid_slp (loop_vec_info loop_vinfo)
 /* Initialize a bb_vec_info struct for the statements in BBS basic blocks.  */
 
 _bb_vec_info::_bb_vec_info (vec<basic_block> _bbs, vec_info_shared *shared)
-  : vec_info (vec_info::bb, init_cost (NULL), shared), bbs (_bbs), roots 
(vNULL)
+  : vec_info (vec_info::bb, init_cost (NULL, false), shared),
+    bbs (_bbs),
+    roots (vNULL)
 {
   for (unsigned i = 0; i < bbs.length (); ++i)
     {
@@ -4530,7 +4532,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
          continue;
        }
 
-      void *scalar_target_cost_data = init_cost (NULL);
+      void *scalar_target_cost_data = init_cost (NULL, true);
       do
        {
          add_stmt_cost (bb_vinfo, scalar_target_cost_data,
@@ -4544,7 +4546,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
       destroy_cost_data (scalar_target_cost_data);
 
       /* Complete the target-specific vector cost calculation.  */
-      void *vect_target_cost_data = init_cost (NULL);
+      void *vect_target_cost_data = init_cost (NULL, false);
       do
        {
          add_stmt_cost (bb_vinfo, vect_target_cost_data,
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9861d9e8810..8d1ffafdbf0 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1455,9 +1455,9 @@ int vect_get_stmt_cost (enum vect_cost_for_stmt 
type_of_cost)
 /* Alias targetm.vectorize.init_cost.  */
 
 static inline void *
-init_cost (class loop *loop_info)
+init_cost (class loop *loop_info, bool costing_for_scalar)
 {
-  return targetm.vectorize.init_cost (loop_info);
+  return targetm.vectorize.init_cost (loop_info, costing_for_scalar);
 }
 
 extern void dump_stmt_cost (FILE *, void *, int, enum vect_cost_for_stmt,
-- 
2.17.1

Reply via email to