On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mli...@suse.cz> wrote: > > On 11/11/21 08:15, Richard Biener wrote: > > So I'd try to do no functional change first, improving the costing and > > setting up the transform to simply pick up the stmts to "fold" as discovered > > during analysis (as I hinted you possibly can use gimple_uid to mark > > the stmts that simplify, IIRC gimple_uid is preserved during copying. > > gimple_uid would also scale better than gimple_plf in case we do > > the analysis for all candidates at once). > > Thinking about the analysis. Am I correct that we want to properly calculate > loop size for true and false edge of a potential gcond before the actually > unswitching?
Yes. > We can do that by finding a first gcond candidate, evaluate (symbolic + > irange approache) > all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to > what we do now > at lines 378-446. Then tree_num_loop_insns can be adjusted for only these > reachable blocks. > Having that, we can calculate # of insns that will live in true/false loops. So whatever we do here we should record as "this control stmt folds to {true,false}" (or {true,unknown}, or in future, "this control stmt will lead to edge {e,unknown}"), recording the simplification on the true/false loop version in a way we can apply it after the transform. > Then we can call tree_unswitch_loop and make the gcond folding as we do in > the versioned loops. > > Is it a step in good direction? Having that we can then extend it to gswitch > statements. One issue I see is that BB_REACHABLE is there only once but you could use auto_bb_flag reachable_true, reachable_false to distinguish the true/false loop version copies. So yes, I think that sounds reasonable. At the point we want to evaluate different (first) unswitching opportunities against each other storing this only as BB flag is likely to hit limits. When we want to evaluate multiple levels of unswitching before doing any transforms even more so (if there are 3 opportunities there'd be many cases to be considered when going to level 3 ;)). I _think_ that a sparse lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns do to a DFS walk from the loop header, ignoring not taken edges by consulting the lattice. Or, for speed reason, pre-compute tree_num_loop_insns for each BB so we just have to sum a different set of BBs rather than walking all stmts again. That said, the second step would definitely be to choose the "best" opportunity on the current level. Richard. > Cheers, > Martin