This issue unfortunately was not solved correctly. In that example we don’t have -fmodulo-sched-allow-regmoves enabled and we should not create any register moves at all.

I’ll describe here everything I found while looking on the issue. At the first step I have patched trunk compiler like this:

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -735,6 +735,7 @@ schedule_reg_moves (partial_schedule_ptr ps)
        continue;

       /* Create NREG_MOVES register moves.  */
+      gcc_assert (flag_modulo_sched_allow_regmoves);
       first_move = ps->reg_moves.length ();
       ps->reg_moves.safe_grow_cleared (first_move + nreg_moves);
       extend_node_sched_params (ps);
@@ -744,8 +745,7 @@ schedule_reg_moves (partial_schedule_ptr ps)

       /* Generate each move.  */
       old_reg = prev_reg = SET_DEST (set);
-      if (HARD_REGISTER_P (old_reg))
-       return false;
+      gcc_assert (!HARD_REGISTER_P (old_reg));

       for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
        {

On current trunk this patch doesn’t give any failures on powerpc, but I’ve checked other platforms and found gcc.c-torture/execute/pr84524.c ICEing with -O1 -fmodulo-sched on aarch64. Speaking about sms-13.c, it happens not to be a proper loop for modulo-scheduling at current trunk. Some changes in other passes caused this, and that file actually tests nothing right now.

Than, I added 8 branch into my testing, and decide first to find a proper way to fix sms-13.c on 8 branch.

Without -fmodulo-sched-allow-regmoves each TRUE_DEP edge should be paired with another (sometimes ANTI_DEP or OUTPUT_DEP) edge in the opposite direction. My assertion failure means that DDG was odd.

As Alexander mentioned here
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00028.html
“SMS uses sched-deps for intra-loop deps, and then separately uses DF for cross-iteration deps, which means that it should be ready for surprises when the two scanners are not 100% in sync.”

Situation here is one of such surprises, and debugging shows that we loose one of dependencies here.

Considering this two instructions:
(insn 30 29 31 8 (parallel [
            (set (reg:SI 142)
                (minus:SI (reg/v:SI 133 [ x+-2 ])
                    (reg/v:SI 130 [ b+-2 ])))
            (set (reg:SI 76 ca)
                (leu:SI (reg/v:SI 130 [ b+-2 ])
                    (reg/v:SI 133 [ x+-2 ])))
        ]) "test.c":22 107 {subfsi3_carry}
     (expr_list:REG_UNUSED (reg:SI 142)
        (nil)))

(insn 31 30 32 8 (parallel [
            (set (reg:SI 143)
                (plus:SI (reg:SI 76 ca)
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:SI 76 ca))
        ]) "test.c":22 116 {subfsi3_carry_in_xx}
     (expr_list:REG_DEAD (reg:SI 76 ca)
        (expr_list:REG_UNUSED (reg:SI 76 ca)
            (nil))))

Sched-deps organize obvious 30->31 true dependency, and we want DF to help us with 31->30 output dependency here. CA is clobbered in 31, and if we move insn 31 forward we have to stop at insn 30 on the next iteration.

Build_inter_loop_deps doesn’t create such output dependency because reaching definitions “rd_bb_info->gen” dependency set doesn’t contain CA register. Maybe, because it is a clobber, not a set. So we need somehow organize that dependency, and I’ve found that df_live set can be used here instead.

I proceed with the following change:

diff --git a/gcc/ddg.c b/gcc/ddg.c
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -365,16 +365,14 @@ add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 static void
 build_inter_loop_deps (ddg_ptr g)
 {
-  unsigned rd_num;
-  struct df_rd_bb_info *rd_bb_info;
+  unsigned regno;
   bitmap_iterator bi;
-
-  rd_bb_info = DF_RD_BB_INFO (g->bb);
+  struct df_live_bb_info *live_bb_info = DF_LIVE_BB_INFO (g->bb);

   /* Find inter-loop register output, true and anti deps.  */
-  EXECUTE_IF_SET_IN_BITMAP (&rd_bb_info->gen, 0, rd_num, bi)
+  EXECUTE_IF_SET_IN_BITMAP (&live_bb_info->gen, 0, regno, bi)
   {
-    df_ref rd = DF_DEFS_GET (rd_num);
+    df_ref rd = df_bb_regno_last_def_find (g->bb, regno);

     add_cross_iteration_register_deps (g, rd);
   }

(Here I skipped minor hunks which properly add/remove df_live problem in modulo-sched.c when optimize == 1).

This fix corrected sms-13.c on 8 branch, but pr84524.c mentioned above was still ICEing on both trunk and 8-branch. No other failures were introduces while testing this intermediate patch on different branches and platforms. Tested patch also contained temporarily additional checks that the only difference between “rd_bb_info->gen” and “live_bb_info->gen” sets may be additional hard registers in live set.

In pr84524.c we got a loop with an extended inline asm:
asm volatile ("" : "+r" (v))
which also gives us a “surprising” situation Alexander predicts.

For sched-deps scanner such volatile asm is a “true barrier” and we create dependencies to almost all other instructions, while DF scanners don’t give us such information.

Maybe it is a good idea somehow re-implement modulo scheduler using only one scanner instead of two, but at the moment the best thing to do is to detect the situation earlier and skip such loops.

So, the last two hunks are:

@@ -1473,6 +1478,8 @@ sms_schedule (void)

         if (CALL_P (insn)
             || BARRIER_P (insn)
+ || (NONDEBUG_INSN_P (insn) && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands (PATTERN (insn))))
             || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
                 && !reg_mentioned_p (count_reg, insn))
@@ -1489,6 +1496,11 @@ sms_schedule (void)
                fprintf (dump_file, "SMS loop-with-call\n");
              else if (BARRIER_P (insn))
                fprintf (dump_file, "SMS loop-with-barrier\n");
+             else if (NONDEBUG_INSN_P (insn)
+                      && extract_asm_operands (PATTERN (insn))
+                      && MEM_VOLATILE_P (extract_asm_operands
+                                               (PATTERN (insn))))
+               fprintf (dump_file, "SMS loop-with-volatile-asm\n");
               else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE))

Patch testing addressed in my cover letter. Ok for trunk?

gcc/ChangeLog:

2019-04-10  Roman Zhuykov  <zhr...@ispras.ru>

        PR target/83507
        * ddg.c (build_inter_loop_deps): Use live_bb_info instead of
        rd_bb_info to search last register definitions.
        * modulo-sched.c (schedule_reg_moves): Add an assertion which
        prevents creating register moves when it is not allowed.
        (sms_schedule): Add/remove df_live problem when optimize == 1.
        Bail out when we got volatile asm inside the loop.


diff --git a/gcc/ddg.c b/gcc/ddg.c
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -365,16 +365,14 @@ add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 static void
 build_inter_loop_deps (ddg_ptr g)
 {
-  unsigned rd_num;
-  struct df_rd_bb_info *rd_bb_info;
+  unsigned regno;
   bitmap_iterator bi;
-
-  rd_bb_info = DF_RD_BB_INFO (g->bb);
+  struct df_live_bb_info *live_bb_info = DF_LIVE_BB_INFO (g->bb);

   /* Find inter-loop register output, true and anti deps.  */
-  EXECUTE_IF_SET_IN_BITMAP (&rd_bb_info->gen, 0, rd_num, bi)
+  EXECUTE_IF_SET_IN_BITMAP (&live_bb_info->gen, 0, regno, bi)
   {
-    df_ref rd = DF_DEFS_GET (rd_num);
+    df_ref rd = df_bb_regno_last_def_find (g->bb, regno);

     add_cross_iteration_register_deps (g, rd);
   }
diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c

       /* Create NREG_MOVES register moves.  */
+      gcc_assert (flag_modulo_sched_allow_regmoves);
       first_move = ps->reg_moves.length ();
       ps->reg_moves.safe_grow_cleared (first_move + nreg_moves);
       extend_node_sched_params (ps);
@@ -744,8 +745,7 @@ schedule_reg_moves (partial_schedule_ptr ps)

       /* Generate each move.  */
       old_reg = prev_reg = SET_DEST (set);
-      if (HARD_REGISTER_P (old_reg))
-       return false;
+      gcc_assert (!HARD_REGISTER_P (old_reg));

       for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
        {
@@ -1371,7 +1371,12 @@ sms_schedule (void)
   else
     issue_rate = 1;

-  /* Initialize the scheduler.  */
+  /* Initialize the scheduler, we need live info even at O1.  */
+  if (optimize == 1)
+    {
+      df_live_add_problem ();
+      df_live_set_all_dirty ();
+    }
   setup_sched_infos ();
   haifa_sched_init ();

@@ -1473,6 +1478,8 @@ sms_schedule (void)

         if (CALL_P (insn)
             || BARRIER_P (insn)
+ || (NONDEBUG_INSN_P (insn) && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands (PATTERN (insn))))
             || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
                 && !reg_mentioned_p (count_reg, insn))
@@ -1489,6 +1496,11 @@ sms_schedule (void)
                fprintf (dump_file, "SMS loop-with-call\n");
              else if (BARRIER_P (insn))
                fprintf (dump_file, "SMS loop-with-barrier\n");
+             else if (NONDEBUG_INSN_P (insn)
+                      && extract_asm_operands (PATTERN (insn))
+                      && MEM_VOLATILE_P (extract_asm_operands
+                                               (PATTERN (insn))))
+               fprintf (dump_file, "SMS loop-with-volatile-asm\n");
               else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE))
                 fprintf (dump_file, "SMS loop-with-not-single-set\n");
@@ -1752,6 +1764,9 @@ sms_schedule (void)
   /* Release scheduler data, needed until now because of DFA.  */
   haifa_sched_finish ();
   loop_optimizer_finalize ();
+
+  if (optimize == 1)
+    df_remove_problem (df_live);
 }

 /* The SMS scheduling algorithm itself

Reply via email to