Hi all, This is one question origining from PR61837, which shows that doloop pass could introduce some loop invariants against its outer loop when doloop performs and prepares the iteration count for hardware count register assignment.
Even with Xionghu's simplification patch https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543887.html, we still have one zero extension un-hoisted, and possibly much more if we don't have any chances to simplify it. The current related pass pipeline is: NEXT_PASS (pass_rtl_loop_init); NEXT_PASS (pass_rtl_move_loop_invariants); NEXT_PASS (pass_rtl_unroll_loops); NEXT_PASS (pass_rtl_doloop); NEXT_PASS (pass_rtl_loop_done); * How about tweaking pass order? 1. move pass_rtl_move_loop_invariants after pass_rtl_doloop? I don't know the historical reason on this order, but one no-go reason may be that it can impact following unroll, which is sensitive to #insn. Hoisting can probably reduce loop body #insn, easy to break existing things without it there? 2. move pass_rtl_doloop before pass_rtl_move_loop_invariants? It looks impractical, pass_rtl_unroll_loops can update the loop niter which is required by doloop preparation, predicting the value correctly sounds impossible. * rerun hoisting pass? 3. run pass_rtl_move_loop_invariants again after pass_rtl_doloop? It looks practical? But currently on Power it can't hoist all the invariants, it's caused that the pseudo used for count register which isn't taken as "invariant" by pass_rtl_move_loop_invariants, since we still leave the doloop end pattern inside the loop, for example: 67: r143:SI=r119:DI#0-0x1 68: r142:DI=zero_extend(r143:SI) 69: r141:DI=r142:DI+0x1 // count ... 66: {pc={(r141:DI!=0x1)?L64:pc};r141:DI=r141:DI-0x1; clobber scratch;clobber scratch;} // doloop_end though the doloop_end will be eliminated on Power eventually, I noticed that there is one hook doloop_begin, which looks to help ports to introduce loop count pseudo. By supporting this for rs6000, I can obtain the insns what I expected on Power. Original: 12: NOTE_INSN_BASIC_BLOCK 4 67: %9:SI=%5:SI-0x1 // invariant 14: %8:DI=%4:DI-0x1 68: %9:DI=zero_extend(%9:SI) // invariant 15: %10:DI=%3:DI 69: %9:DI=%9:DI+0x1 // invariant 82: ctr:DI=%9:DI REG_DEAD %9:DI With rerun hoisting: 12: NOTE_INSN_BASIC_BLOCK 4 69: %8:DI=%5:DI+0x1 // invariant 14: %10:DI=%4:DI-0x1 84: ctr:DI=%8:DI REG_DEAD %8:DI 15: %9:DI=%3:DI With rerun hoisting + doloop_begin: 12: NOTE_INSN_BASIC_BLOCK 4 70: ctr:DI=%5:DI 14: %10:DI=%4:DI-0x1 15: %9:DI=%3:DI But I still had the concern that I'm not sure the current hoisting pass will consider register pressure, if the niter related invariants have some derived uses inside the loop, probably creating more register pressure here. Attached is the initial patch for this naive proposal, I'd like to post it first to see whether it's reasonable to proceed. If it's reasonable, one thing to be improved can be to guard it for only the related outer loops of which doloop perform succesfully on to save compilation time. Welcome any comments! BR, Kewen
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 11ab74589b4..07340455348 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -12658,6 +12658,20 @@ return "ori %0,%0,0"; }) + +;; Define doloop_begin to introduce one pseudo for count register. +(define_expand "doloop_begin" + [(use (match_operand 0)) ; loop pseudo + (use (match_operand 1))] ; doloop_end seq + "" +{ + rtx ctr_pseudo = gen_reg_rtx (Pmode); + rtx_insn* end_seq = (rtx_insn*) operands[1]; + validate_replace_rtx (operands[0], ctr_pseudo, end_seq); + emit_move_insn (ctr_pseudo, operands[0]); + DONE; +}) + ;; Define the subtract-one-and-jump insns, starting with the template ;; so loop.c knows what to generate. diff --git a/gcc/loop-init.c b/gcc/loop-init.c index 401e5282907..0cbb8e0bd32 100644 --- a/gcc/loop-init.c +++ b/gcc/loop-init.c @@ -516,6 +516,7 @@ public: {} /* opt_pass methods: */ + opt_pass * clone () { return new pass_rtl_move_loop_invariants (m_ctxt); } virtual bool gate (function *) { return flag_move_loop_invariants; } virtual unsigned int execute (function *fun) { diff --git a/gcc/passes.def b/gcc/passes.def index 2bf2cb78fc5..4d8a63f2ae3 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -429,6 +429,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_rtl_move_loop_invariants); NEXT_PASS (pass_rtl_unroll_loops); NEXT_PASS (pass_rtl_doloop); + NEXT_PASS (pass_rtl_move_loop_invariants); NEXT_PASS (pass_rtl_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_lower_subreg2);