Hi, On Wed, Nov 11 2020, Richard Biener wrote: > On Mon, 9 Nov 2020, Martin Jambor wrote: > >> this patch modifies the loop invariant pass so that is can operate >> only on a single requested loop and its sub-loops and ignore the rest >> of the function, much like it currently ignores basic blocks that are >> not in any real loop. It then invokes it from within the loop >> interchange pass when it successfully swaps two loops. This avoids >> the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r >> (which are 19% and 15% faster than current master on an AMD zen2 >> machine) while not introducing a full LIM pass into the pass pipeline. >> >> I have not modified the LIM data structures, this means that it still >> contains vectors indexed by loop->num even though only a single loop >> nest is actually processed. I also did not replace the uses of >> pre_and_rev_post_order_compute_fn with a function that would count a >> postorder only for a given loop. I can of course do so if the >> approach is otherwise deemed viable. >> >> The patch adds one additional global variable requested_loop to the >> pass and then at various places behaves differently when it is set. I >> was considering storing the fake root loop into it for normal >> operation, but since this loop often requires special handling anyway, >> I came to the conclusion that the code would actually end up less >> straightforward. >> >> I have bootstrapped and tested the patch on x86_64-linux and a very >> similar one on aarch64-linux. I have also tested it by modifying the >> tree_ssa_lim function to run loop_invariant_motion_from_loop on each >> real outermost loop in a function and this variant also passed >> bootstrap and all tests, including dump scans, of all languages. >> >> I have built the entire SPEC 2006 FPrate monitoring the activity of >> the LIM pass without and with the patch (on top of commit b642fca1c31 >> with which 526.blender_r and 538.imagick_r seemed to be failing) and >> it only examined 0.2% more loops, 0.02% more BBs and even fewer >> percent of statements because it is invoked only in a rather special >> circumstance. But the patch allows for more such need-based uses at >> hopefully reasonable cost. >> >> Since I do not have much experience with loop optimizers, I expect >> that there will be requests to adjust the patch during the review. >> Still, it fixes a performance regression against GCC 9 and so I hope >> to address the concerns in time to get it into GCC 11. >>
[...] > > That said, in the way it's currently structured I think it's > "better" to export tree_ssa_lim () and call it from interchange > if any loop was interchanged (thus run a full pass but conditional > on interchange done). You can make it cheaper by adding a flag > to tree_ssa_lim whether to do store-motion (I guess this might > be an interesting user-visible flag as well and a possibility > to make select lim passes cheaper via a pass flag) and not do > store-motion from the interchange call. I think that's how we should > fix the regression, refactoring LIM properly requires more work > that doesn't seem to fit the stage1 deadline. > So just like this? Bootstrapped and tested on x86_64-linux and I have verified it fixes the bwaves reduction. Thanks, Martin gcc/ChangeLog: 2020-11-12 Martin Jambor <mjam...@suse.cz> PR tree-optimization/94406 * tree-ssa-loop-im.c (tree_ssa_lim): Renamed to loop_invariant_motion_in_fun, added a parameter to control store motion. (pass_lim::execute): Adjust call to tree_ssa_lim, now loop_invariant_motion_in_fun. * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Declare. * gimple-loop-interchange.cc (pass_linterchange::execute): Call loop_invariant_motion_in_fun if any interchange has been done. --- gcc/gimple-loop-interchange.cc | 9 +++++++-- gcc/tree-ssa-loop-im.c | 12 +++++++----- gcc/tree-ssa-loop-manip.h | 2 +- 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index 1656004ecf0..a36dbb49b1f 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -2085,8 +2085,13 @@ pass_linterchange::execute (function *fun) } if (changed_p) - scev_reset (); - return changed_p ? (TODO_update_ssa_only_virtuals) : 0; + { + unsigned todo = TODO_update_ssa_only_virtuals; + todo |= loop_invariant_motion_in_fun (cfun, false); + scev_reset (); + return todo; + } + return 0; } } // anon namespace diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 6bb07e133cd..3c7412737f0 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -3089,10 +3089,11 @@ tree_ssa_lim_finalize (void) } /* Moves invariants from loops. Only "expensive" invariants are moved out -- - i.e. those that are likely to be win regardless of the register pressure. */ + i.e. those that are likely to be win regardless of the register pressure. + Only perform store motion if STORE_MOTION is true. */ -static unsigned int -tree_ssa_lim (function *fun) +unsigned int +loop_invariant_motion_in_fun (function *fun, bool store_motion) { unsigned int todo = 0; @@ -3114,7 +3115,8 @@ tree_ssa_lim (function *fun) /* Execute store motion. Force the necessary invariants to be moved out of the loops as well. */ - do_store_motion (); + if (store_motion) + do_store_motion (); free (rpo); rpo = XNEWVEC (int, last_basic_block_for_fn (fun)); @@ -3175,7 +3177,7 @@ pass_lim::execute (function *fun) if (number_of_loops (fun) <= 1) return 0; - unsigned int todo = tree_ssa_lim (fun); + unsigned int todo = loop_invariant_motion_in_fun (fun, true); if (!in_loop_pipeline) loop_optimizer_finalize (); diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h index e789e4fcb0b..d8e918ef7c9 100644 --- a/gcc/tree-ssa-loop-manip.h +++ b/gcc/tree-ssa-loop-manip.h @@ -55,7 +55,7 @@ extern void tree_transform_and_unroll_loop (class loop *, unsigned, extern void tree_unroll_loop (class loop *, unsigned, edge, class tree_niter_desc *); extern tree canonicalize_loop_ivs (class loop *, tree *, bool); - +extern unsigned int loop_invariant_motion_in_fun (function *, bool); #endif /* GCC_TREE_SSA_LOOP_MANIP_H */ -- 2.29.2