[PATCH draft v2] sched: Don't skip empty block in scheduling [PR108273]

Kewen.Lin Mon, 11 Dec 2023 23:03:15 -0800

Hi,

on 2023/11/22 17:30, Kewen.Lin wrote:
> on 2023/11/17 20:55, Alexander Monakov wrote:
>>
>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>>> to put it early in schedule_insns.
>>>
>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>> by this placement.
>>
>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>> cfg_cleanup can invalidate it.
> 
> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
> 
>>
>>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>>
>>> By looking into some failures, I found that although cleanup_cfg is executed
>>> there would be still some empty blocks left, by analyzing a few failures 
>>> there
>>> are at least such cases:
>>>   1. empty function body
>>>   2. block holding a label for return.
>>>   3. block without any successor.
>>>   4. block which becomes empty after scheduling some other block.
>>>   5. block which looks mergeable with its always successor but left.
>>>   ...
>>>
>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>>> the assumption this kind of empty block doesn't have the chance to have 
>>> debug
>>> insn (like associated debug insn should be moved along), I'm not sure.  For 
>>> 5,
>>> a reduced test case is:
>>
>> Oh, I should have thought of cases like these, really sorry about the slip
>> of attention, and thanks for showing a testcase for item 5. As Richard as
>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>> would be changing no_real_insns_p to always return false, and see if the
>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>> a non-bootstrapped compiler are still informative).
> 
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on.  One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues).  The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal.  If this is the direction which we all agree on,
> I'll further refine it and post a formal patch.  One thing I want to note is
> that this patch disable one assertion below:
> 
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>      }
> 
>    /* Sanity check: verify that all region insns were scheduled.  */
> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
>    sched_finish_ready_list ();
> 
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other 
> blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in.  It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly.  I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.
>


The attached one is the improved draft patch v2 for skipping empty BB, against
the previous draft, it does:
  1) use NONDEBUG_INSN_P for !DEBUG_INSN_P && !NOTE_P when it's appropriate;
  2) merge NOTE_P special handling into the one on DEBUG_INSN_P;
  3) fix exposed issue on broad testing on EBB;
  4) introduce rgn_init_empty_bb for mismatch count issue;
  5) add/refine some comments;

It's bootstrapped and regress-tested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.  I also tested with EBB turned on by default, one
issue in schedule_ebb got exposed and fixed, bootstrapped on x86_64-redhat-linux
and powerpc64{,le}-linux-gnu, regress-tested on powerpc64{,le}-linux-gnu, one
guality failure on x86_64-redhat-linux.  With seletive-scheduling turned on by
default, it's bootstrapped on x86_64-redhat-linux and a few guality failures
and some failures with extra warnings (like with -fvar-tracking-assignments),
those failures are trivial.  Note that I'm unable to test forced 
seletive-scheduling
on Power since it's even broken without this patch, there seems a few issues,
I'll file some PRs separately.

Any comments are highly appreciated.  I'll go forward to make a formal patch if
there is no objection this week.  I think we still can keep this no_real_insns_p
since in some places like free_block_dependencies it's still good for early
return, we can just remove its uses in some places. 

BR,
Kewen

----

Subject: [PATCH draft v2] sched: Don't skip empty block in scheduling

---
 gcc/haifa-sched.cc | 172 ++++++++++++++++++++++++++-------------------
 gcc/rtl.h          |   4 +-
 gcc/sched-ebb.cc   |   7 +-
 gcc/sched-rgn.cc   |  23 +++++-
 4 files changed, 126 insertions(+), 80 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..7b4c4a92bb0 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1207,6 +1207,11 @@ recompute_todo_spec (rtx_insn *next, bool for_backtrack)
   int n_replace = 0;
   bool first_p = true;
 
+  /* Since we don't skip no_real_insns_p block any more, it's
+     possible to meet NOTE insn now, early return if so.  */
+  if (NOTE_P (next))
+    return 0;
+
   if (sd_lists_empty_p (next, SD_LIST_BACK))
     /* NEXT has all its dependencies resolved.  */
     return 0;
@@ -1726,6 +1731,11 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
   int *max_reg_pressure;
   static int death[N_REG_CLASSES];
 
+  /* Since we don't skip no_real_insns_p block any more, it's possible to
+     schedule NOTE insn now, we should check for it first.  */
+  if (NOTE_P (insn))
+    return;
+
   gcc_checking_assert (!DEBUG_INSN_P (insn));
 
   excess_cost_change = 0;
@@ -4017,10 +4027,10 @@ schedule_insn (rtx_insn *insn)
 
   /* Scheduling instruction should have all its dependencies resolved and
      should have been removed from the ready list.  */
-  gcc_assert (sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
+  gcc_assert (NOTE_P (insn) || sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
 
   /* Reset debug insns invalidated by moving this insn.  */
-  if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn))
+  if (MAY_HAVE_DEBUG_BIND_INSNS && NONDEBUG_INSN_P (insn))
     for (sd_it = sd_iterator_start (insn, SD_LIST_BACK);
         sd_iterator_cond (&sd_it, &dep);)
       {
@@ -4106,61 +4116,66 @@ schedule_insn (rtx_insn *insn)
 
   check_clobbered_conditions (insn);
 
-  /* Update dependent instructions.  First, see if by scheduling this insn
-     now we broke a dependence in a way that requires us to change another
-     insn.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
-       sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+  /* Since we don't skip no_real_insns_p block any more, it's possible to
+     schedule NOTE insn now, we should check for it first.  */
+  if (!NOTE_P (insn))
     {
-      struct dep_replacement *desc = DEP_REPLACE (dep);
-      rtx_insn *pro = DEP_PRO (dep);
-      if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED
-         && desc != NULL && desc->insn == pro)
-       apply_replacement (dep, false);
-    }
+      /* Update dependent instructions.  First, see if by scheduling this insn
+        now we broke a dependence in a way that requires us to change another
+        insn.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
+          sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+       {
+         struct dep_replacement *desc = DEP_REPLACE (dep);
+         rtx_insn *pro = DEP_PRO (dep);
+         if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED && desc != NULL
+             && desc->insn == pro)
+           apply_replacement (dep, false);
+       }
 
-  /* Go through and resolve forward dependencies.  */
-  for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
-       sd_iterator_cond (&sd_it, &dep);)
-    {
-      rtx_insn *next = DEP_CON (dep);
-      bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
+      /* Go through and resolve forward dependencies.  */
+      for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+          sd_iterator_cond (&sd_it, &dep);)
+       {
+         rtx_insn *next = DEP_CON (dep);
+         bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
 
-      /* Resolve the dependence between INSN and NEXT.
-        sd_resolve_dep () moves current dep to another list thus
-        advancing the iterator.  */
-      sd_resolve_dep (sd_it);
+         /* Resolve the dependence between INSN and NEXT.
+            sd_resolve_dep () moves current dep to another list thus
+            advancing the iterator.  */
+         sd_resolve_dep (sd_it);
 
-      if (cancelled)
-       {
-         if (must_restore_pattern_p (next, dep))
-           restore_pattern (dep, false);
-         continue;
-       }
+         if (cancelled)
+           {
+             if (must_restore_pattern_p (next, dep))
+               restore_pattern (dep, false);
+             continue;
+           }
 
-      /* Don't bother trying to mark next as ready if insn is a debug
-        insn.  If insn is the last hard dependency, it will have
-        already been discounted.  */
-      if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
-       continue;
+         /* Don't bother trying to mark next as ready if insn is a debug
+            insn.  If insn is the last hard dependency, it will have
+            already been discounted.  */
+         if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
+           continue;
 
-      if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
-       {
-         int effective_cost;
+         if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
+           {
+             int effective_cost;
 
-         effective_cost = try_ready (next);
+             effective_cost = try_ready (next);
 
-         if (effective_cost >= 0
-             && SCHED_GROUP_P (next)
-             && advance < effective_cost)
-           advance = effective_cost;
-       }
-      else
-       /* Check always has only one forward dependence (to the first insn in
-          the recovery block), therefore, this will be executed only once.  */
-       {
-         gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
-         fix_recovery_deps (RECOVERY_BLOCK (insn));
+             if (effective_cost >= 0 && SCHED_GROUP_P (next)
+                 && advance < effective_cost)
+               advance = effective_cost;
+           }
+         else
+           /* Check always has only one forward dependence (to the first insn
+              in the recovery block), therefore, this will be executed only
+              once.  */
+           {
+             gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
+             fix_recovery_deps (RECOVERY_BLOCK (insn));
+           }
        }
     }
 
@@ -4170,9 +4185,9 @@ schedule_insn (rtx_insn *insn)
      may use this information to decide how the instruction should
      be aligned.  */
   if (issue_rate > 1
+      && NONDEBUG_INSN_P (insn)
       && GET_CODE (PATTERN (insn)) != USE
-      && GET_CODE (PATTERN (insn)) != CLOBBER
-      && !DEBUG_INSN_P (insn))
+      && GET_CODE (PATTERN (insn)) != CLOBBER)
     {
       if (reload_completed)
        PUT_MODE (insn, clock_var > last_clock_var ? TImode : VOIDmode);
@@ -5036,8 +5051,11 @@ get_ebb_head_tail (basic_block beg, basic_block end,
 /* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
 
 bool
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_insns_p (const rtx_insn *head ATTRIBUTE_UNUSED,
+                const rtx_insn *tail ATTRIBUTE_UNUSED)
 {
+  return false;
+#if 0
   while (head != NEXT_INSN (tail))
     {
       if (!NOTE_P (head) && !LABEL_P (head))
@@ -5045,6 +5063,7 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn 
*tail)
       head = NEXT_INSN (head);
     }
   return true;
+#endif
 }
 
 /* Restore-other-notes: NOTE_LIST is the end of a chain of notes
@@ -6224,8 +6243,12 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, 
basic_block *target_bb)
        scheduled_insns.iterate (i, &insn);
        i++)
     {
-      if (control_flow_insn_p (last_scheduled_insn)
-         || current_sched_info->advance_target_bb (*target_bb, insn))
+      /* Since we don't skip no_real_insns_p block any more, it's possible
+        to schedule NOTE insn now, we should check for it here to avoid
+        unexpected target bb advance.  */
+      if ((control_flow_insn_p (last_scheduled_insn)
+          || current_sched_info->advance_target_bb (*target_bb, insn))
+         && !NOTE_P (insn))
        {
          *target_bb = current_sched_info->advance_target_bb (*target_bb, 0);
 
@@ -6245,7 +6268,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, 
basic_block *target_bb)
        (*current_sched_info->begin_move_insn) (insn, last_scheduled_insn);
       move_insn (insn, last_scheduled_insn,
                 current_sched_info->next_tail);
-      if (!DEBUG_INSN_P (insn))
+      if (NONDEBUG_INSN_P (insn))
        reemit_notes (insn);
       last_scheduled_insn = insn;
     }
@@ -6296,7 +6319,7 @@ prune_ready_list (state_t temp_state, bool 
first_cycle_insn_p,
          int cost = 0;
          const char *reason = "resource conflict";
 
-         if (DEBUG_INSN_P (insn))
+         if (DEBUG_INSN_P (insn) || NOTE_P (insn))
            continue;
 
          if (sched_group_found && !SCHED_GROUP_P (insn)
@@ -6504,7 +6527,7 @@ schedule_block (basic_block *target_bb, state_t 
init_state)
      and caused problems because schedule_block and compute_forward_dependences
      had different notions of what the "head" insn was.  */
 
-  gcc_assert (head != tail || INSN_P (head));
+  gcc_assert (head != tail || INSN_P (head) || NOTE_P (head));
 
   haifa_recovery_bb_recently_added_p = false;
 
@@ -6539,15 +6562,15 @@ schedule_block (basic_block *target_bb, state_t 
init_state)
   if (targetm.sched.init)
     targetm.sched.init (sched_dump, sched_verbose, ready.veclen);
 
+  gcc_assert (((NOTE_P (prev_head) || DEBUG_INSN_P (prev_head))
+              && BLOCK_FOR_INSN (prev_head) == *target_bb)
+             || (head == tail && NOTE_P (head)));
+
   /* We start inserting insns after PREV_HEAD.  */
   last_scheduled_insn = prev_head;
   last_nondebug_scheduled_insn = NULL;
   nonscheduled_insns_begin = NULL;
 
-  gcc_assert ((NOTE_P (last_scheduled_insn)
-              || DEBUG_INSN_P (last_scheduled_insn))
-             && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb);
-
   /* Initialize INSN_QUEUE.  Q_SIZE is the total number of insns in the
      queue.  */
   q_ptr = 0;
@@ -6725,15 +6748,16 @@ schedule_block (basic_block *target_bb, state_t 
init_state)
                }
            }
 
-         /* We don't want md sched reorder to even see debug isns, so put
-            them out right away.  */
-         if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
+         /* We don't want md sched reorder to even see debug and note insns,
+            so put them out right away.  */
+         if (ready.n_ready
+             && !NONDEBUG_INSN_P (ready_element (&ready, 0))
              && (*current_sched_info->schedule_more_p) ())
            {
-             while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
+             while (ready.n_ready && !NONDEBUG_INSN_P (ready_element (&ready, 
0)))
                {
                  rtx_insn *insn = ready_remove_first (&ready);
-                 gcc_assert (DEBUG_INSN_P (insn));
+                 gcc_assert (DEBUG_INSN_P (insn) || NOTE_P (insn));
                  (*current_sched_info->begin_schedule_ready) (insn);
                  scheduled_insns.safe_push (insn);
                  last_scheduled_insn = insn;
@@ -7145,17 +7169,18 @@ schedule_block (basic_block *target_bb, state_t 
init_state)
 int
 set_priorities (rtx_insn *head, rtx_insn *tail)
 {
+  /* Since we don't skip no_real_insns_p block any more, it's
+     possible to meet NOTE insn now, we don't need to compute
+     priority for such block, so early return.  */
+  if (head == tail && !INSN_P (head))
+    return 1;
+
   rtx_insn *insn;
-  int n_insn;
+  int n_insn = 0;
   int sched_max_insns_priority =
        current_sched_info->sched_max_insns_priority;
   rtx_insn *prev_head;
 
-  if (head == tail && ! INSN_P (head))
-    gcc_unreachable ();
-
-  n_insn = 0;
-
   prev_head = PREV_INSN (head);
   for (insn = tail; insn != prev_head; insn = PREV_INSN (insn))
     {
@@ -7688,7 +7713,8 @@ fix_tick_ready (rtx_insn *next)
 {
   int tick, delay;
 
-  if (!DEBUG_INSN_P (next) && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
+  if (NONDEBUG_INSN_P (next)
+      && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
     {
       int full_p;
       sd_iterator_def sd_it;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..34b3f31d1ee 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2695,8 +2695,8 @@ do {                                                      
                \
 /* During sched, 1 if RTX is an insn that must be scheduled together
    with the preceding insn.  */
 #define SCHED_GROUP_P(RTX)                                             \
-  (RTL_FLAG_CHECK4 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,          \
-                   JUMP_INSN, CALL_INSN)->in_struct)
+  (RTL_FLAG_CHECK5 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN,          \
+                   JUMP_INSN, CALL_INSN, NOTE)->in_struct)
 
 /* For a SET rtx, SET_DEST is the place that is set
    and SET_SRC is the value it is set to.  */
diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 110fcdbca4d..c07e65696b9 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -478,12 +478,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool 
modulo_scheduling)
      a note or two.  */
   while (head != tail)
     {
-      if (NOTE_P (head) || DEBUG_INSN_P (head))
+      if (LABEL_P (head) || NOTE_P (head) || DEBUG_INSN_P (head))
        head = NEXT_INSN (head);
       else if (NOTE_P (tail) || DEBUG_INSN_P (tail))
        tail = PREV_INSN (tail);
-      else if (LABEL_P (head))
-       head = NEXT_INSN (head);
       else
        break;
     }
@@ -494,7 +492,8 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool 
modulo_scheduling)
   if (no_real_insns_p (head, tail))
     return BLOCK_FOR_INSN (tail);
 
-  gcc_assert (INSN_P (head) && INSN_P (tail));
+  gcc_assert ((NOTE_P (head) && head == tail)
+             || (INSN_P (head) && INSN_P (tail)));
 
   if (!bitmap_bit_p (&dont_calc_deps, first_bb->index))
     {
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..658349ba2b6 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -228,6 +228,9 @@ static edgeset *pot_split;
 /* For every bb, a set of its ancestor edges.  */
 static edgeset *ancestor_edges;
 
+/* Indicate the bb is empty initially if set.  */
+static bitmap rgn_init_empty_bb;
+
 #define INSN_PROBABILITY(INSN) (SRC_PROB (BLOCK_TO_BB (BLOCK_NUM (INSN))))
 
 /* Speculative scheduling functions.  */
@@ -3216,6 +3219,14 @@ schedule_region (int rgn)
       /* Clean up.  */
       if (current_nr_blocks > 1)
        free_trg_info ();
+
+      /* This empty block isn't empty initially, it means the only NOTE
+        inside was not counted when computing rgn_n_insns, so fix it up
+        now.  */
+      if (head == tail
+         && NOTE_P (head)
+         && !bitmap_bit_p (rgn_init_empty_bb, bb))
+       rgn_n_insns++;
     }
 
   /* Sanity check: verify that all region insns were scheduled.  */
@@ -3448,7 +3459,16 @@ sched_rgn_local_init (int rgn)
            continue;
          FOR_EACH_EDGE (e, ei, block->succs)
            e->aux = NULL;
-        }
+       }
+    }
+
+  rgn_init_empty_bb = BITMAP_ALLOC (NULL);
+  for (bb = 0; bb < current_nr_blocks; bb++)
+    {
+      rtx_insn *head, *tail;
+      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+      if (head == tail && NOTE_P (head))
+       bitmap_set_bit (rgn_init_empty_bb, bb);
     }
 }
 
@@ -3461,6 +3481,7 @@ sched_rgn_local_free (void)
   sbitmap_vector_free (pot_split);
   sbitmap_vector_free (ancestor_edges);
   free (rgn_edges);
+  BITMAP_FREE (rgn_init_empty_bb);
 }
 
 /* Free data computed for the finished region.  */
-- 
2.39.3

[PATCH draft v2] sched: Don't skip empty block in scheduling [PR108273]

Reply via email to