Hi all,

This is one question origining from PR61837, which shows that doloop
pass could introduce some loop invariants against its outer loop when
doloop performs and prepares the iteration count for hardware
count register assignment. 

Even with Xionghu's simplification patch
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543887.html,
we still have one zero extension un-hoisted, and possibly much
more if we don't have any chances to simplify it.

The current related pass pipeline is:

    NEXT_PASS (pass_rtl_loop_init);
    NEXT_PASS (pass_rtl_move_loop_invariants);
    NEXT_PASS (pass_rtl_unroll_loops);
    NEXT_PASS (pass_rtl_doloop);
    NEXT_PASS (pass_rtl_loop_done);

* How about tweaking pass order?

  1. move pass_rtl_move_loop_invariants after pass_rtl_doloop?

    I don't know the historical reason on this order, but one no-go
    reason may be that it can impact following unroll, which is
    sensitive to #insn.  Hoisting can probably reduce loop body #insn,
    easy to break existing things without it there?

  2. move pass_rtl_doloop before pass_rtl_move_loop_invariants?

    It looks impractical, pass_rtl_unroll_loops can update the loop
    niter which is required by doloop preparation, predicting the value
    correctly sounds impossible.

* rerun hoisting pass?

  3. run pass_rtl_move_loop_invariants again after pass_rtl_doloop?
    
    It looks practical? But currently on Power it can't hoist all the
    invariants, it's caused that the pseudo used for count register which
    isn't taken as "invariant" by pass_rtl_move_loop_invariants, since
    we still leave the doloop end pattern inside the loop, for example:
      
      67: r143:SI=r119:DI#0-0x1           
      68: r142:DI=zero_extend(r143:SI)
      69: r141:DI=r142:DI+0x1            // count
      ...
      66: {pc={(r141:DI!=0x1)?L64:pc};r141:DI=r141:DI-0x1;
           clobber scratch;clobber scratch;}   // doloop_end

    though the doloop_end will be eliminated on Power eventually,
    
    I noticed that there is one hook doloop_begin, which looks to help
    ports to introduce loop count pseudo.  By supporting this for rs6000, 
    I can obtain the insns what I expected on Power.

        Original:
           12: NOTE_INSN_BASIC_BLOCK 4
           67: %9:SI=%5:SI-0x1           // invariant
           14: %8:DI=%4:DI-0x1
           68: %9:DI=zero_extend(%9:SI)  // invariant
           15: %10:DI=%3:DI
           69: %9:DI=%9:DI+0x1           // invariant
           82: ctr:DI=%9:DI
              REG_DEAD %9:DI

        With rerun hoisting:

           12: NOTE_INSN_BASIC_BLOCK 4
           69: %8:DI=%5:DI+0x1          // invariant
           14: %10:DI=%4:DI-0x1
           84: ctr:DI=%8:DI
              REG_DEAD %8:DI
           15: %9:DI=%3:DI

        With rerun hoisting + doloop_begin:

           12: NOTE_INSN_BASIC_BLOCK 4
           70: ctr:DI=%5:DI
           14: %10:DI=%4:DI-0x1
           15: %9:DI=%3:DI

    But I still had the concern that I'm not sure the current hoisting pass
    will consider register pressure, if the niter related invariants
    have some derived uses inside the loop, probably creating more 
    register pressure here.

Attached is the initial patch for this naive proposal, I'd like to post it
first to see whether it's reasonable to proceed.  If it's reasonable, one
thing to be improved can be to guard it for only the related outer loops
of which doloop perform succesfully on to save compilation time.

Welcome any comments!

BR,
Kewen
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 11ab74589b4..07340455348 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12658,6 +12658,20 @@
   return "ori %0,%0,0";
 })
 
+
+;; Define doloop_begin to introduce one pseudo for count register.
+(define_expand "doloop_begin"
+  [(use (match_operand 0))     ; loop pseudo
+   (use (match_operand 1))]    ; doloop_end seq
+  ""
+{
+  rtx ctr_pseudo = gen_reg_rtx (Pmode);
+  rtx_insn* end_seq = (rtx_insn*) operands[1];
+  validate_replace_rtx (operands[0], ctr_pseudo, end_seq);
+  emit_move_insn (ctr_pseudo, operands[0]);
+  DONE;
+})
+
 ;; Define the subtract-one-and-jump insns, starting with the template
 ;; so loop.c knows what to generate.
 
diff --git a/gcc/loop-init.c b/gcc/loop-init.c
index 401e5282907..0cbb8e0bd32 100644
--- a/gcc/loop-init.c
+++ b/gcc/loop-init.c
@@ -516,6 +516,7 @@ public:
   {}
 
   /* opt_pass methods: */
+  opt_pass * clone () { return new pass_rtl_move_loop_invariants (m_ctxt); }
   virtual bool gate (function *) { return flag_move_loop_invariants; }
   virtual unsigned int execute (function *fun)
     {
diff --git a/gcc/passes.def b/gcc/passes.def
index 2bf2cb78fc5..4d8a63f2ae3 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -429,6 +429,7 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_rtl_move_loop_invariants);
          NEXT_PASS (pass_rtl_unroll_loops);
          NEXT_PASS (pass_rtl_doloop);
+         NEXT_PASS (pass_rtl_move_loop_invariants);
          NEXT_PASS (pass_rtl_loop_done);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_lower_subreg2);

Reply via email to