On Mon, Nov 22, 2021 at 4:07 PM Martin Liška <mli...@suse.cz> wrote: > > On 11/19/21 11:00, Richard Biener wrote: > > On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mli...@suse.cz> wrote: > >> > >> On 11/11/21 08:15, Richard Biener wrote: > >>> So I'd try to do no functional change first, improving the costing and > >>> setting up the transform to simply pick up the stmts to "fold" as > >>> discovered > >>> during analysis (as I hinted you possibly can use gimple_uid to mark > >>> the stmts that simplify, IIRC gimple_uid is preserved during copying. > >>> gimple_uid would also scale better than gimple_plf in case we do > >>> the analysis for all candidates at once). > >> > >> Thinking about the analysis. Am I correct that we want to properly > >> calculate > >> loop size for true and false edge of a potential gcond before the actually > >> unswitching? > > > > Yes. > > > >> We can do that by finding a first gcond candidate, evaluate (symbolic + > >> irange approache) > >> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly > >> to what we do now > >> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these > >> reachable blocks. > >> Having that, we can calculate # of insns that will live in true/false > >> loops. > > > > So whatever we do here we should record as "this control stmt folds to > > {true,false}" (or {true,unknown}, > > or in future, "this control stmt will lead to edge {e,unknown}"), > > recording the simplification > > on the true/false loop version in a way we can apply it after the transform. > > > >> Then we can call tree_unswitch_loop and make the gcond folding as we do in > >> the versioned loops. > >> > >> Is it a step in good direction? Having that we can then extend it to > >> gswitch statements. > > > > One issue I see is that BB_REACHABLE is there only once but you could use > > auto_bb_flag reachable_true, reachable_false to distinguish the > > true/false loop version > > copies. > > > > So yes, I think that sounds reasonable. At the point we want to > > evaluate different > > (first) unswitching opportunities against each other storing this only > > as BB flag is > > likely to hit limits. When we want to evaluate multiple levels of > > unswitching before > > doing any transforms even more so (if there are 3 opportunities there'd be > > many cases to be considered when going to level 3 ;)). I _think_ that a > > sparse > > lattice of stmt UID -> edge might do the trick if we change > > tree_num_loop_insns > > do to a DFS walk from the loop header, ignoring not taken edges by > > consulting the > > lattice. Or, for speed reason, pre-compute tree_num_loop_insns for each BB > > so we just have to sum a different set of BBs rather than walking all > > stmts again. > > > > That said, the second step would definitely be to choose the "best" > > opportunity > > on the current level. > > > > Richard. > > > >> Cheers, > >> Martin > > Hello. > > I'm sending a new version where I changed: > 1) all unswitch_predicates are find for a loop > 2) context sensitive costing happens based on an unswitch_predicate and BB > reachability > is implemented > 3) folding happens in recursive invocation once we decide to unswitch > 4) the patch folds both symbolic gcond predicates and irange provided by > ranger > 5) debug counter was added > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, > I tested it > on SPEC2006 and SPEC2017 with -size=ref.
Meh, diff made a mess out if this ;) Random comments, I'm walking myself the optimizations flow. tree_unswitch_single_loop: + unswitch_predicate *predicate = NULL; + if (num > param_max_unswitch_level) + { + if (dump_file + && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; Not unswitching anymore, hit max level\n"); + goto exit; + } this looks like we can do this check before find_all_unswitching_predicates? + for (auto pred: candidates) + { + unsigned cost + = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred); ... so this searches for the first candidate that fits in param_max_unswitch_insns, it doesn't yet try to find the cheapest / best one. Please add a comment to say that. After we found one candidate we apply unswitching to such one candidate (and throw the others away). I guess that's OK - it's what the old code did - what you did for this intermediate step is actually gather all unswitching predicates upfront. Hopefully we'll be able to share some of the work done for the recursive invocations. + fprintf (dump_file, ";; Unswitching loop with condition: "); "on condition" + fprintf (dump_file, ";; Not unswitching condition, loop too big " + "(%d insns): ", cost); "cost too big"? I assume 'cost' is the number of stmts we'll add, loop-size - true-eliminated - false-eliminated? +exit: + for (auto predicate: candidates) + delete predicate; Some refactoring should get rid of the goto ... +static unsigned +evaluate_insns (class loop *loop, basic_block *bbs, + unswitch_predicate *candidate, bool true_edge, + gimple_ranger *ranger, auto_bb_flag &reachable_flag) +{ ... + unswitch_predicate *predicate + = tree_may_unswitch_on (bb, loop, ranger); + if (predicate != NULL) + { + tree folded + = simplify_using_entry_checks (predicate, candidate, + true_edge); so just looking at the implementation it seems that if you assign UIDs to all last_stmt() in a loop you can keep tree_may_unswitch_on () in a dense array and you do not have to re-compute it? UIDs should survive copying so even for recursive unswitchings the predicates should be shareable (pointing to the wrong stmt then, but then just do not record the stmt in there - it should be available from the caller somehow). But then I wonder how this translates to gswitch handling where the result is a taken edge / a set of known not taken edges. I suggest to try wiring in gcond support in a separate commit on top of this to see whether some refactoring is needed. Looking at evaluate_insns it also seems that 'reachable_flag' is unused - computing the sizes could be done once you process a block, no? Previously I suggested to compute estimate_num_insns for each BB of the original loop once so you can re-use that and just need to sum up BB costs. +static bool +find_all_unswitching_predicates (class loop *loop, basic_block *bbs, + bool true_edge, + unswitch_predicate *parent_predicate, + gimple_ranger *ranger, + vec<unswitch_predicate *> &candidates) +{ ... so you re-do the simplification step here, in fact this function seems to not only find unswitching predicates but apply simplifications from the previous unswitching step. (yes, that's what the old code did ...) The costing step already did all the work though, it should be possible to keep a vector of stmts to simplify and the simplification from that analysis time. You can get at the "other loop copy" stmt via last_stmt (get_bb_copy (gimple_bb (cond))) if you do that before free_original_copy_tables. I think it would be nice to have the simplification step explicit before recursing even if that means some duplicate walking for now (I do hope we can share the find_all_unswitching_predicates analysis step) You are applying param_max_unswitch_insns with + unsigned cost + = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred); - cond = simplify_using_entry_checks (loop, cond); - stmt = last_stmt (bbs[i]); - if (integer_nonzerop (cond)) + if (cost <= (unsigned)param_max_unswitch_insns) where cost is the sum of the simplified sizes of both loop copies after unswitching. By its very nature this will match for all recursive invocations (loops only will get smaller) which is why we need param_max_unswitch_level. The idea was to have one param_max_unswitch_insns budget for each _original_ loop the recursive invocations will eat from, removing the need for param_max_unswitch_level. I'd re-interpret param_max_unswitch_insns as the number of insns we may add, so with cost calculated as above we'd have if (cost <= budget) ... and budget -= cost; before the recursion (and pass by reference so that the two recursions share the budget). Initialize by budget = original_loop_size + param_max_unswitch_insns; Hope this wasn't too many comments ;) Thanks, Richard. > Thoughts? > Thanks, > Martin >