Kewen asked me to have a look at this since Richard was reluctant to approve it (given that it was his idea).
TBH I don't know much about the pass machinery, so I'm not sure I'm really best placed. There again, perhaps I know just enough to realise that this is indeed the hack that it's billed as being. "Kewen.Lin via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > diff --git a/gcc/passes.def b/gcc/passes.def > index c0098d755bf..c74add75068 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -288,11 +288,16 @@ along with GCC; see the file COPYING3. If not see > /* pass_vectorize must immediately follow pass_if_conversion. > Please do not add any other passes in between. */ > NEXT_PASS (pass_vectorize); > - PUSH_INSERT_PASSES_WITHIN (pass_vectorize) > + PUSH_INSERT_PASSES_WITHIN (pass_vectorize) > NEXT_PASS (pass_dce); > - POP_INSERT_PASSES () > - NEXT_PASS (pass_predcom); > + POP_INSERT_PASSES () > + NEXT_PASS (pass_predcom); > NEXT_PASS (pass_complete_unroll); > + NEXT_PASS (pass_pre_slp_scalar_cleanup); > + PUSH_INSERT_PASSES_WITHIN (pass_pre_slp_scalar_cleanup) > + NEXT_PASS (pass_fre, false /* may_iterate */); > + NEXT_PASS (pass_dse); > + POP_INSERT_PASSES () > NEXT_PASS (pass_slp_vectorize); > NEXT_PASS (pass_loop_prefetch); > /* Run IVOPTs after the last pass that uses data-reference analysis Very minor, but after fixing the tabification of the existing code, it looks like the new code introduces tabification issues of the same kind. > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index f01e811917d..a6b202d98f5 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -301,6 +301,11 @@ protected: > /* Release function body and stop pass manager. */ > #define TODO_discard_function (1 << 23) > > +/* Used as one pending action, it expects the following scalar > + cleanup pass will clear it and do the cleanup work when it's > + met. */ > +#define TODO_force_next_scalar_cleanup (1 << 24) Since the other todo flags are grouped, maybe it would be worth adding a new group: /* To-do flags for pending_TODOs. */ Although given that we're not far from running out of todo flags, I wonder whether we should use a different bitmask numbering scheme altogether (PENDING_TODO_*?) and make passes assign directly to pending_TODOs. As noted below, I think that would also make the code simpler. I think in practice the producers of the flags are just as aware that the actions are pending as the consumers are, which is why cunroll isn't doing the VN itself. For the comment, maybe: /* Tell the next scalar cleanup pass that there is work for it to do. */ I think the comment about clearing bits belongs on pending_TODOs itself, since in principle it would apply to all pending TODO flags. > @@ -627,6 +633,12 @@ extern gimple_opt_pass *make_pass_convert_switch > (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_gimple_isel (gcc::context *ctxt); > > +/* Different from normal TODO_flags which are handled right at the begin > + or the end of one pass execution, the pending TODO_flags are allowed > + to be passed down in the pipeline until one of its consumers can take s/are allowed to be passed down in/are passed down/. > + over it. */ Maybe: s/take over it/perform the requested action/? And maybe add something like: Consumers should then clear the flags for the actions that they've taken. > +extern unsigned int pending_TODOs; Would it be better to put this in struct function? At least that would prevent any possibility of the flags being accidentally carried over between functions. (Obviously that would be a bug, but there's a danger that it might be a mostly silent bug.) > + > /* Current optimization pass. */ > extern opt_pass *current_pass; > > diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c > index 298ab215530..905bd3add59 100644 > --- a/gcc/tree-ssa-loop-ivcanon.c > +++ b/gcc/tree-ssa-loop-ivcanon.c > @@ -1404,13 +1404,14 @@ tree_unroll_loops_completely_1 (bool > may_increase_size, bool unroll_outer, > computations; otherwise, the size might blow up before the > iteration is complete and the IR eventually cleaned up. */ > if (loop_outer (loop_father)) > - { > - /* Once we process our father we will have processed > - the fathers of our children as well, so avoid doing > - redundant work and clear fathers we've gathered sofar. */ > - bitmap_clear (father_bbs); > - bitmap_set_bit (father_bbs, loop_father->header->index); > - } > + /* Once we process our father we will have processed > + the fathers of our children as well, so avoid doing > + redundant work and clear fathers we've gathered sofar. > + But don't clear it for one outermost loop since its > + propagation doesn't run necessarily all the time. */ > + bitmap_clear (father_bbs); > + > + bitmap_set_bit (father_bbs, loop_father->header->index); Is this a way of telling the caller to set “outermost_unrolled”? If so, with the suggestion above, we could just set pending_TODOs directly. I had to read the above several times before I (hopefully) understood it. It seems that it's really overloading the bitmap for two different things (hence the need to explain why the clearing doesn't happen for the outer loop). Thanks, Richard > > return true; > } > @@ -1429,6 +1430,8 @@ tree_unroll_loops_completely (bool may_increase_size, > bool unroll_outer) > bool changed; > int iteration = 0; > bool irred_invalidated = false; > + int retval = 0; > + bool outermost_unrolled = false; > > estimate_numbers_of_iterations (cfun); > > @@ -1472,6 +1475,8 @@ tree_unroll_loops_completely (bool may_increase_size, > bool unroll_outer) > if (loop_outer (unrolled_loop_bb->loop_father)) > bitmap_set_bit (fathers, > unrolled_loop_bb->loop_father->num); > + else if (!outermost_unrolled) > + outermost_unrolled = true; > } > bitmap_clear (father_bbs); > /* Propagate the constants within the new basic blocks. */ > @@ -1509,11 +1514,15 @@ tree_unroll_loops_completely (bool may_increase_size, > bool unroll_outer) > > BITMAP_FREE (father_bbs); > > + /* Trigger scalar cleanup once any outermost loop gets unrolled. */ > + if (outermost_unrolled && unroll_outer) > + retval |= TODO_force_next_scalar_cleanup; > + > if (irred_invalidated > && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)) > mark_irreducible_loops (); > > - return 0; > + return retval; > } > > /* Canonical induction variable creation pass. */