Hi Kewen,

On 9/15/21 3:52 AM, Kewen.Lin wrote:
Hi,

This patch follows the discussion here[1], where Segher suggested
parameterizing those exact magic constants for density heuristics,
to make it easier to tweak if need.

Since these heuristics are quite internal, I make these parameters
as undocumented and be mainly used by developers.

The change here should be "No Functional Change".  But I verified
it with SPEC2017 at option sets O2-vect and Ofast-unroll on Power8,
the result is neutral as expected.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-----
gcc/ChangeLog:

        * config/rs6000/rs6000.opt (rs6000-density-pct-threshold,
        rs6000-density-size-threshold, rs6000-density-penalty,
        rs6000-density-load-pct-threshold,
        rs6000-density-load-num-threshold): New parameter.
        * config/rs6000/rs6000.c (rs6000_density_test): Adjust with
        corresponding parameters.

---
  gcc/config/rs6000/rs6000.c   | 22 +++++++---------------
  gcc/config/rs6000/rs6000.opt | 21 +++++++++++++++++++++
  2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9bc826e3a50..4ab23b0ab33 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5284,9 +5284,6 @@ struct rs6000_cost_data
  static void
  rs6000_density_test (rs6000_cost_data *data)
  {
-  const int DENSITY_PCT_THRESHOLD = 85;
-  const int DENSITY_SIZE_THRESHOLD = 70;
-  const int DENSITY_PENALTY = 10;
    struct loop *loop = data->loop_info;
    basic_block *bbs = get_loop_body (loop);
    int nbbs = loop->num_nodes;
@@ -5322,26 +5319,21 @@ rs6000_density_test (rs6000_cost_data *data)
    free (bbs);
    density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

-  if (density_pct > DENSITY_PCT_THRESHOLD
-      && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
+  if (density_pct > rs6000_density_pct_threshold
+      && vec_cost + not_vec_cost > rs6000_density_size_threshold)
      {
-      data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+      data->cost[vect_body] = vec_cost * (100 + rs6000_density_penalty) / 100;
        if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                         "density %d%%, cost %d exceeds threshold, penalizing "
-                        "loop body cost by %d%%\n", density_pct,
-                        vec_cost + not_vec_cost, DENSITY_PENALTY);
+                        "loop body cost by %u%%\n", density_pct,
+                        vec_cost + not_vec_cost, rs6000_density_penalty);
      }

    /* Check whether we need to penalize the body cost to account
       for excess strided or elementwise loads.  */
    if (data->extra_ctor_cost > 0)
      {
-      /* Threshold for load stmts percentage in all vectorized stmts.  */
-      const int DENSITY_LOAD_PCT_THRESHOLD = 45;
-      /* Threshold for total number of load stmts.  */
-      const int DENSITY_LOAD_NUM_THRESHOLD = 20;
-
        gcc_assert (data->nloads <= data->nstmts);
        unsigned int load_pct = (data->nloads * 100) / data->nstmts;

@@ -5355,8 +5347,8 @@ rs6000_density_test (rs6000_cost_data *data)
              the loads.
         One typical case is the innermost loop of the hotspot of SPEC2017
         503.bwaves_r without loop interchange.  */
-      if (data->nloads > DENSITY_LOAD_NUM_THRESHOLD
-         && load_pct > DENSITY_LOAD_PCT_THRESHOLD)
+      if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
+         && load_pct > (unsigned int) rs6000_density_load_pct_threshold)
        {
          data->cost[vect_body] += data->extra_ctor_cost;
          if (dump_enabled_p ())
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..563983f3269 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,3 +639,24 @@ Enable instructions that guard against return-oriented 
programming attacks.
  mprivileged
  Target Var(rs6000_privileged) Init(0)
  Generate code that will run in privileged state.
+
+-param=rs6000-density-pct-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) 
IntegerRange(0, 99) Param
+When costing for loop vectorization, we probably need to penalize the loop 
body cost if the existing cost model may not adequately reflect delays from 
unavailable vector resources.  We collect the cost for vectorized statements 
and non-vectorized statements separately, check the proportion of vec_cost to 
total cost of vec_cost and non vec_cost, and penalize only if the proportion 
exceeds the threshold specified by this parameter.  The default value is 85.
+
+-param=rs6000-density-size-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_size_threshold) 
Init(70) IntegerRange(0, 99) Param

I think 99 is not a sufficient upper bound.  This is a counting value that could in theory get much higher.  Can you set it to something ridiculous like IntegerRange(0, 1000)?

+Like parameter rs6000-density-pct-threshold, we also check the total sum of 
vec_cost and non vec_cost, and penalize only if the sum exceeds the threshold 
specified by this parameter.  The default value is 70.
+
+-param=rs6000-density-penalty=
+Target Undocumented Joined UInteger Var(rs6000_density_penalty) Init(10) 
IntegerRange(0, 1000) Param
+When both heuristics with rs6000-density-pct-threshold and 
rs6000-density-size-threshold are satisfied, we decide to penalize the loop 
body cost by the value which is specified by this parameter.  The default value 
is 10.
+
+-param=rs6000-density-load-pct-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_load_pct_threshold) 
Init(45) IntegerRange(0, 99) Param
+When costing for loop vectorization, we probably need to penalize the loop 
body cost by accounting for excess strided or elementwise loads.  We collect 
the numbers for general statements and load statements according to the 
information for statement to be vectorized, check the proportion of load 
statements, and penalize only if the proportion exceeds the threshold specified 
by this parameter.  The default value is 45.

"according to the information for statement*s* to be vectorized"

Otherwise, these descriptions are quite excellent, by the way.

This looks good to me with those minor things addressed.  Recommend that maintainers approve.

Thanks for the patch!
Bill
+
+-param=rs6000-density-load-num-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_load_num_threshold) 
Init(20) IntegerRange(0, 1000) Param
+Like parameter rs6000-density-load-pct-threshold, we also check if the total 
number of load statements exceeds the threshold specified by this parameter, 
and penalize only if it's satisfied.  The default value is 20.
+
--
2.25.1


Reply via email to