On 2021/8/10 12:25, Ulrich Drepper wrote:
> On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> For this case, theorotically I think the master GCC will optimize it to:
>>
>>    invariant;
>>    for (;;)
>>      if (unlikely_cond)
>>        for (;;)
>>           ;
>>
>> 'invariant' is moved out of outer loop, but with the patch, it will get:
>>
>>    for (;;)
>>      if (unlikely_cond)
>>        {
>>          invariant;
>>          for (;;)
>>             ;
>>        }
>>
>> 'invariant' is *cold* for outer loop, but it is still *hot* for inner loop,
>> so hoist it out of inner loop, this is exactly what we want, right?
> 
> Is relying on absolute numbers really what you want?  If the
> 'unlikely_cond' condition depends on the iteration count of the outer
> loop the probability of it being true in each individual iteration can
> be low (at least that's how I use unlikely) but the overall
> probability of needing the code is higher 1 - (1 - p)^n  if 'p' is the
> probability of 'unlikely_cond' and 'n' is the number of iterations.
> Assuming complete independence of the loop iterations, otherwise it's
> rather an upper limit.
> 
> At the very least I'd generate code like this:
> 
>    first = true;
>    for (;;)
>      if (unlikely_cond)
>        {
>          if (first)
>            {
>              invariant;
>              first = false;
>            }
>          for (;;)
>             ;
>        }
> 
> If it's worth hoisting the code the the extra test and flag should be
> small in cost in comparison.
> 
> If 'unlikely_cond' does not in any way depend on the loop iteration
> then I think your code generation is fine.


Thanks for your good suggestion, I am also not sure whether it is
necessary to do it this way:)  But I found that even the first step
of 


  for (;;)
    if (unlikely_cond)
      {
        invariant;
        for (;;)
           ;
      }

is not supported yet.  So I added a new function *find_coldest_out_loop*
to search the coldest function between outermost invariant loop and own
loop in compute_invariantness to move invariant out to cold loop first:



[PATCH v2] Don't move cold code out of loop by checking bb count


There was a patch trying to avoid move cold block out of loop:

https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html

Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".

In gimple LIM analysis,  add find_coldest_out_loop to move invariants to
expected target loop, then in both gimple LIM move_computations_worker
and RTL loop-invariant.c find_invariants_bb, if profile count check find
 the loop bb is colder than target loop preheader, don't hoist it out of
loop.

SPEC2017 performance evaluation shows 1% performance improvement for
intrate GEOMEAN and no obvious regression for others.  Especially,
500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
on P8LE.

Regression and bootstrap tested pass on P8LE, any comments?  Thanks.

gcc/ChangeLog:

        * loop-invariant.c (find_invariants_bb): Check profile count
        before motion.
        (find_invariants_body): Add argument.
        * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
        (outermost_invariant_loop): Use find_coldest_out_loop.
        (determine_max_movement): Likewise.
        (move_computations_worker): Check profile count before motion.
        (execute_sm): Likewise.
        (execute_sm_exit): Check pointer validness.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/recip-3.c: Adjust.
        * gcc.dg/tree-ssa/ssa-lim-16.c: New test.
        * gcc.dg/tree-ssa/ssa-lim-17.c: New test.
---
 gcc/loop-invariant.c                       |  10 +-
 gcc/tree-ssa-loop-im.c                     | 186 +++++++++++++++++++--
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c    |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c |  21 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c |  26 +++
 5 files changed, 231 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index fca0c2b24be..5c3be7bf0eb 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+                   bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  if (preheader->count > bb->count)
+    return;
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-                       bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
                        bitmap_bit_p (always_executed, i));
 }
 
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..2b90caa90fc 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -417,6 +417,23 @@ movement_possibility (gimple *stmt)
   return ret;
 }
 
+/* Find coldest loop between min_loop and loop by comapring profile count.  */
+
+static class loop *
+find_coldest_out_loop (class loop *min_loop, class loop *loop)
+{
+  class loop *cold_loop = min_loop;
+  profile_count min_count = loop_preheader_edge (min_loop)->src->count;
+
+  while (min_loop != loop)
+    {
+      min_loop = superloop_at_depth (loop, loop_depth (min_loop) + 1);
+      if (loop_preheader_edge (min_loop)->src->count < min_count)
+       cold_loop = min_loop;
+    }
+  return cold_loop;
+}
+
 /* Suppose that operand DEF is used inside the LOOP.  Returns the outermost
    loop to that we could move the expression using DEF if it did not have
    other operands, i.e. the outermost loop enclosing LOOP in that the value
@@ -431,18 +448,18 @@ outermost_invariant_loop (tree def, class loop *loop)
   struct lim_aux_data *lim_data;
 
   if (!def)
-    return superloop_at_depth (loop, 1);
+    return find_coldest_out_loop (superloop_at_depth (loop, 1), loop);
 
   if (TREE_CODE (def) != SSA_NAME)
     {
       gcc_assert (is_gimple_min_invariant (def));
-      return superloop_at_depth (loop, 1);
+      return find_coldest_out_loop (superloop_at_depth (loop, 1), loop);
     }
 
   def_stmt = SSA_NAME_DEF_STMT (def);
   def_bb = gimple_bb (def_stmt);
   if (!def_bb)
-    return superloop_at_depth (loop, 1);
+    return find_coldest_out_loop (superloop_at_depth (loop, 1), loop);
 
   max_loop = find_common_loop (loop, def_bb->loop_father);
 
@@ -452,7 +469,11 @@ outermost_invariant_loop (tree def, class loop *loop)
                                 loop_outer (lim_data->max_loop));
   if (max_loop == loop)
     return NULL;
-  max_loop = superloop_at_depth (loop, loop_depth (max_loop) + 1);
+  max_loop = find_coldest_out_loop (max_loop, loop);
+  if (max_loop == loop)
+    return max_loop;
+  else
+    max_loop = superloop_at_depth (loop, loop_depth (max_loop) + 1);
 
   return max_loop;
 }
@@ -684,7 +705,7 @@ determine_max_movement (gimple *stmt, bool 
must_preserve_exec)
   if (must_preserve_exec)
     level = ALWAYS_EXECUTED_IN (bb);
   else
-    level = superloop_at_depth (loop, 1);
+    level = find_coldest_out_loop (superloop_at_depth (loop, 1), loop);
   lim_data->max_loop = level;
 
   if (gphi *phi = dyn_cast <gphi *> (stmt))
@@ -787,6 +808,7 @@ determine_max_movement (gimple *stmt, bool 
must_preserve_exec)
                                                     loop, ref);
          if (!lim_data->max_loop)
            return false;
+         lim_data->max_loop = find_coldest_out_loop (lim_data->max_loop, loop);
        }
       else if (! add_dependency (gimple_vuse (stmt), lim_data, loop, false))
        return false;
@@ -1154,6 +1176,58 @@ move_computations_worker (basic_block bb)
          continue;
        }
 
+      edge e = loop_preheader_edge (level);
+      if (e->src->count > bb->count)
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           {
+             fprintf (dump_file, "PHI node NOT moved to %d from %d:\n",
+                      e->src->index, bb->index);
+             print_gimple_stmt (dump_file, stmt, 0);
+             fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+                      level->num);
+           }
+         gsi_next (&bsi);
+         continue;
+       }
+      else
+       {
+         unsigned i;
+         bool skip_phi_move = false;
+         for (i = 0; i < gimple_phi_num_args (stmt); i++)
+           {
+             tree def = PHI_ARG_DEF (stmt, i);
+
+             if (TREE_CODE (def) != SSA_NAME)
+               continue;
+
+             gimple *def_stmt = SSA_NAME_DEF_STMT (def);
+
+             if (!gimple_bb (def_stmt))
+               continue;
+
+             if (!dominated_by_p (CDI_DOMINATORS, e->src,
+                                  gimple_bb (def_stmt)))
+               {
+                 if (dump_file && (dump_flags & TDF_DETAILS))
+                   {
+                     fprintf (dump_file, "PHI node NOT moved to %d from %d\n",
+                              e->src->index, bb->index);
+                     print_gimple_stmt (dump_file, stmt, 0);
+                     fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+                              level->num);
+                   }
+                 skip_phi_move = true;
+                 break;
+               }
+           }
+         if (skip_phi_move)
+           {
+             gsi_next (&bsi);
+             continue;
+           }
+       }
+
       if (dump_file && (dump_flags & TDF_DETAILS))
        {
          fprintf (dump_file, "Moving PHI node\n");
@@ -1191,14 +1265,13 @@ move_computations_worker (basic_block bb)
          tree lhs = gimple_assign_lhs (new_stmt);
          SSA_NAME_RANGE_INFO (lhs) = NULL;
        }
-      gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
+      gsi_insert_on_edge (e, new_stmt);
       remove_phi_node (&bsi, false);
     }
 
   for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); )
     {
       edge e;
-
       gimple *stmt = gsi_stmt (bsi);
 
       lim_data = get_lim_data (stmt);
@@ -1221,7 +1294,83 @@ move_computations_worker (basic_block bb)
       /* We do not really want to move conditionals out of the loop; we just
         placed it here to force its operands to be moved if necessary.  */
       if (gimple_code (stmt) == GIMPLE_COND)
-       continue;
+       {
+         gsi_next (&bsi);
+         continue;
+       }
+
+      e = loop_preheader_edge (level);
+      if (e->src->count > bb->count)
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           {
+             fprintf (dump_file, "stmt: Statement NOT moved to %d from %d \n",
+                      e->src->index, bb->index);
+             print_gimple_stmt (dump_file, stmt, 0);
+             fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+                      level->num);
+           }
+         gsi_next (&bsi);
+         continue;
+       }
+      else
+       {
+         if (is_gimple_assign (stmt))
+           {
+             tree rhs1 = gimple_assign_rhs1 (stmt);
+             tree rhs2 = gimple_assign_rhs2 (stmt);
+             if (TREE_CODE (rhs1) == MEM_REF)
+               {
+                 rhs2 = TREE_OPERAND (rhs1, 1);
+                 rhs1 = TREE_OPERAND (rhs1, 0);
+               }
+             gimple *stmt1 = NULL, *stmt2 = NULL;
+             basic_block def_bb;
+             if (rhs1 && TREE_CODE (rhs1) == SSA_NAME)
+               {
+                 stmt1 = SSA_NAME_DEF_STMT (rhs1);
+                 def_bb = gimple_bb (stmt1);
+                 if (stmt1
+                     && def_bb
+                     && (def_bb == bb
+                         || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb)))
+                   {
+                     if (dump_file && (dump_flags & TDF_DETAILS))
+                       {
+                         fprintf (dump_file,
+                                  "stmt1: Statement NOT moved to %d from %d\n",
+                                  e->src->index, bb->index);
+                         print_gimple_stmt (dump_file, stmt, 0);
+                         fprintf (dump_file, "(cost %u) out of loop %d.\n\n",
+                                  cost, level->num);
+                       }
+                     gsi_next (&bsi);
+                     continue;
+                   }
+               }
+             if (rhs2 && TREE_CODE (rhs2) == SSA_NAME)
+               {
+                 stmt2 = SSA_NAME_DEF_STMT (rhs2);
+                 def_bb = gimple_bb (stmt2);
+                 if (stmt2 && def_bb
+                     && (def_bb == bb
+                         || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb)))
+                   {
+                     if (dump_file && (dump_flags & TDF_DETAILS))
+                       {
+                         fprintf (dump_file,
+                                  "stmt2: Statement NOT moved to %d from %d\n",
+                                  e->src->index, bb->index);
+                         print_gimple_stmt (dump_file, stmt, 0);
+                         fprintf (dump_file, "(cost %u) out of loop %d.\n\n",
+                                  cost, level->num);
+                       }
+                     gsi_next (&bsi);
+                     continue;
+                   }
+               }
+           }
+       }
 
       if (dump_file && (dump_flags & TDF_DETAILS))
        {
@@ -1231,7 +1380,6 @@ move_computations_worker (basic_block bb)
                   cost, level->num);
        }
 
-      e = loop_preheader_edge (level);
       gcc_assert (!gimple_vdef (stmt));
       if (gimple_vuse (stmt))
        {
@@ -2133,6 +2281,19 @@ execute_sm (class loop *loop, im_mem_ref *ref,
   bool multi_threaded_model_p = false;
   gimple_stmt_iterator gsi;
   sm_aux *aux = new sm_aux;
+  basic_block bb = gimple_bb (first_mem_ref_loc (loop, ref)->stmt);
+
+  edge e = loop_preheader_edge (loop);
+  if (e->src->count > bb->count)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       {
+         fprintf (dump_file, "Don't execute store motion of ");
+         print_generic_expr (dump_file, ref->mem.ref);
+         fprintf (dump_file, " from loop %d\n", loop->num);
+       }
+      return;
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2241,7 +2402,12 @@ execute_sm_exit (class loop *loop, edge ex, 
vec<seq_entry> &seq,
        }
       else
        {
-         sm_aux *aux = *aux_map.get (ref);
+         sm_aux **paux = aux_map.get (ref);
+         sm_aux *aux;
+         if (paux)
+           aux = *paux;
+         else
+           continue;
          if (!aux->store_flag || kind == sm_ord)
            {
              gassign *store;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
index 638bf38db8c..641c91e719e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
@@ -23,4 +23,4 @@ float h ()
        F[0] += E / d;
 }
 
-/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
+/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c
new file mode 100644
index 00000000000..49b5979dc7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int x;
+void
+assert_fail (int, char *, char *);
+void
+foo (int *a, int n, int k)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    {
+      if (__builtin_expect (x, 0))
+       assert_fail (k / 5, "one", "two");
+      a[i] = k;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "stmt: Statement NOT moved to" 1 "lim2" } 
} */
+/* { dg-final { scan-tree-dump-times "stmt: Statement NOT moved to" 1 "lim4" } 
} */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c
new file mode 100644
index 00000000000..3b1c7c0cb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int x;
+void
+assert_fail (int, char *, char *);
+void
+foo (int *a, int n, int m, int k, int s)
+{
+  int i;
+  int j;
+
+  for (i = 0; i < m; i++)
+    {
+      if (__builtin_expect (x, 0))
+       for (j = 0; j < n; j++)
+         {
+           assert_fail (k / 5, "one", "two");
+         a[s] = k;
+       }
+      a[s] = s;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */
+/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
-- 
2.27.0.90.geebb51ba8c


Reply via email to