On Wed, 22 Sep 2021, Xionghu Luo wrote: > > > On 2021/8/11 17:16, Richard Biener wrote: > > On Wed, 11 Aug 2021, Xionghu Luo wrote: > > > >> > >> > >> On 2021/8/10 22:47, Richard Biener wrote: > >>> On Mon, 9 Aug 2021, Xionghu Luo wrote: > >>> > >>>> Thanks, > >>>> > >>>> On 2021/8/6 19:46, Richard Biener wrote: > >>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote: > >>>>> > >>>>>> loop split condition is moved between loop1 and loop2, the split bb's > >>>>>> count and probability should also be duplicated instead of (100% vs > >>>>>> INV), > >>>>>> secondly, the original loop1 and loop2 count need be propotional from > >>>>>> the > >>>>>> original loop. > >>>>>> > >>>>>> > >>>>>> diff base/loop-cond-split-1.c.151t.lsplit > >>>>>> patched/loop-cond-split-1.c.151t.lsplit: > >>>>>> ... > >>>>>> int prephitmp_16; > >>>>>> int prephitmp_25; > >>>>>> > >>>>>> <bb 2> [local count: 118111600]: > >>>>>> if (n_7(D) > 0) > >>>>>> goto <bb 4>; [89.00%] > >>>>>> else > >>>>>> goto <bb 3>; [11.00%] > >>>>>> > >>>>>> <bb 3> [local count: 118111600]: > >>>>>> return; > >>>>>> > >>>>>> <bb 4> [local count: 105119324]: > >>>>>> pretmp_3 = ga; > >>>>>> > >>>>>> - <bb 5> [local count: 955630225]: > >>>>>> + <bb 5> [local count: 315357973]: > >>>>>> # i_13 = PHI <i_10(20), 0(4)> > >>>>>> # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)> > >>>>>> if (prephitmp_12 != 0) > >>>>>> goto <bb 6>; [33.00%] > >>>>>> else > >>>>>> goto <bb 7>; [67.00%] > >>>>>> > >>>>>> - <bb 6> [local count: 315357972]: > >>>>>> + <bb 6> [local count: 104068130]: > >>>>>> _2 = do_something (); > >>>>>> ga = _2; > >>>>>> > >>>>>> - <bb 7> [local count: 955630225]: > >>>>>> + <bb 7> [local count: 315357973]: > >>>>>> # prephitmp_5 = PHI <prephitmp_12(5), _2(6)> > >>>>>> i_10 = inc (i_13); > >>>>>> if (n_7(D) > i_10) > >>>>>> goto <bb 21>; [89.00%] > >>>>>> else > >>>>>> goto <bb 11>; [11.00%] > >>>>>> > >>>>>> <bb 11> [local count: 105119324]: > >>>>>> goto <bb 3>; [100.00%] > >>>>>> > >>>>>> - <bb 21> [local count: 850510901]: > >>>>>> + <bb 21> [local count: 280668596]: > >>>>>> if (prephitmp_12 != 0) > >>>>>> - goto <bb 20>; [100.00%] > >>>>>> + goto <bb 20>; [33.00%] > >>>>>> else > >>>>>> - goto <bb 19>; [INV] > >>>>>> + goto <bb 19>; [67.00%] > >>>>>> > >>>>>> - <bb 20> [local count: 850510901]: > >>>>>> + <bb 20> [local count: 280668596]: > >>>>>> goto <bb 5>; [100.00%] > >>>>>> > >>>>>> - <bb 19> [count: 0]: > >>>>>> + <bb 19> [local count: 70429947]: > >>>>>> # i_23 = PHI <i_10(21)> > >>>>>> # prephitmp_25 = PHI <prephitmp_5(21)> > >>>>>> > >>>>>> - <bb 12> [local count: 955630225]: > >>>>>> + <bb 12> [local count: 640272252]: > >>>>>> # i_15 = PHI <i_23(19), i_22(16)> > >>>>>> # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)> > >>>>>> i_22 = inc (i_15); > >>>>>> if (n_7(D) > i_22) > >>>>>> goto <bb 16>; [89.00%] > >>>>>> else > >>>>>> goto <bb 11>; [11.00%] > >>>>>> > >>>>>> - <bb 16> [local count: 850510901]: > >>>>>> + <bb 16> [local count: 569842305]: > >>>>>> goto <bb 12>; [100.00%] > >>>>>> > >>>>>> } > >>>>>> > >>>>>> gcc/ChangeLog: > >>>>>> > >>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > >>>>>> (do_split_loop_on_cond): Likewise. > >>>>>> --- > >>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- > >>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) > >>>>>> > >>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > >>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644 > >>>>>> --- a/gcc/tree-ssa-loop-split.c > >>>>>> +++ b/gcc/tree-ssa-loop-split.c > >>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > >>>>>> basic_block cond_bb; > >>>> > >>>> if (!initial_true) > >>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>>> + > >>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > >>>> + ? EDGE_SUCC (bbs[i], 0) > >>>> + : EDGE_SUCC (bbs[i], 1); > >>>> > >>>>>> > >>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, > >>>>>> - profile_probability::always > >>>>>> (), > >>>>>> - profile_probability::always > >>>>>> (), > >>>>>> - profile_probability::always > >>>>>> (), > >>>>>> - profile_probability::always > >>>>>> (), > >>>>>> + true_edge->probability, > >>>>>> + > >>>>>> true_edge->probability.invert (), > >>>>>> + true_edge->probability, > >>>>>> + > >>>>>> true_edge->probability.invert (), > >>>>>> true); > >>>>> > >>>>> there is no 'true_edge' variable at this point. > >>>> > >>>> Sorry, missed the above hunk when split the patch. > >>>> > >>>>> > >>>>>> gcc_assert (loop2); > >>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, > >>>>>> edge invar_branch) > >>>>>> initialize_original_copy_tables (); > >>>>>> > >>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node, > >>>>>> NULL, > >>>>>> - profile_probability::always (), > >>>>>> - profile_probability::never (), > >>>>>> - profile_probability::always (), > >>>>>> - profile_probability::always (), > >>>>>> + invar_branch->probability.invert > >>>>>> (), > >>>>>> + invar_branch->probability, > >>>>>> + invar_branch->probability.invert > >>>>>> (), > >>>>>> + invar_branch->probability, > >>>>>> true); > >>>>>> if (!loop2) > >>>>>> { > >>>>> > >>>>> The patch introduction seems to talk about do_split_loop_on_cond only. > >>>> > >>>> split_loop faces similar issue though it sets the two branches to 100% vs > >>>> 100% > >>>> and no scaling which seems also incorrect. > >>>> > >>>>> Since loop versioning inserts a condition with the passed probabilities > >>>>> but in this case a 'boolean_true_node' condition the then and else > >>>>> probabilities passed look correct. It's just the scaling arguments > >>>>> that look wrong? This loop_version call should get a comment as to > >>>>> why we are passing probabilities the way we do. > >>>> > >>>> This optimization is transforming from: > >>>> > >>>> for (i = 0; i < n; i = inc (i)) > >>>> { > >>>> if (ga) > >>>> ga = do_something (); > >>>> } > >>>> > >>>> to: > >>>> > >>>> for (i = 0; i < x; i = inc (i)) > >>>> { > >>>> if (true) > >>>> ga = do_something (); > >>>> if (!ga) > >>>> break; > >>>> } > >>>> for (; i < n; i = inc (i)) > >>>> { > >>>> if (false) > >>>> ga = do_something (); > >>>> } > >>>> > >>>> > >>>> `boolean_true_node` is passed in to make the first loop's condition > >>>> statement to be 'true', after returning from loop_version, there is a > >>>> piece of code forcing the condition in second loop to be false, > >>>> and the original condition is moved from *in loop* to *exit edge* > >>>> between loop1 and loop2. > >>> > >>> Yes, one complication is that we use loop_version but do not retain > >>> the CFG it creates. Something like the vectorizers > >>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" > >>> but then that code doesn't do any scaling yet. But then > >>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose > >>> we could write a variant that simply doesn't mangle the CFG > >>> with a new condition switching between both loops but keeps them > >>> executed after each other ... > >>> > >>> As said, this adds to the confusion and some awkwardness. > >> > >> Then loop_version in loop split requires two types of variant, one > >> is to insert condition to loop preheader for 'split_loops' usage, > >> another is to insert condition to loop exit for 'do_split_loop_on_cond' > >> usage, it needs one extra function to encapsulate these cfg alterations > >> from outside to inside. > >> > >> unswitching only execute one loop as it only moves the invariant condition > >> to first loop's pre-header. While 'split_loops' and > >> 'do_split_loop_on_cond' > >> may execute both loops no matter the condition is moved to the first loop's > >> preheader or exit. > >> > >> The condition stmt in loop unswitching is invariant, but it is variant > >> in loop splitting, that's why loop unswitching execute only one loop > >> but loop splitting executes both loops. > >> > >> Seems we need two condition arguments for loop_version, one for connecting > >> loop1 preheader to loop2 preheader, another one for connecting loop1's exit > >> to loop2's header? Then it will be more generic for both unswitching pass > >> and splitting pass. Looks a bit complicated and conflicted with > >> loop_version's > >> comments: > >> > >> /* Main entry point for Loop Versioning transformation. > >> > >> This transformation given a condition and a loop, creates > >> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */ > >> > >> > >> And this only works for loop split usage, those many other places > >> doesn't use loop_version like this? > > > > Yes, as said if you don't want the above CFG then you probably > > shouldn't use loop_version but instead use its building blocks > > (and some refactoring of loop_version can make that easier). > > > > I think splitting wants > > > > loop_copy1 > > if (condition) > > loop_copy2 > > > > IMHO it would be good to split 'loopify' into the actual loopification > > and the scaling. Basically make the part of loop_version that > > copies the loop on the header edge and creates a loop structure for > > it separate. > > > > loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg > > (copy_loop_before). > > > > Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports > copying loops with single exit, it would cause many ICEs in it even > building GCC stage1 (line 1065, line 1184 due to exit or new_exit > is NULL returning from single_exit (loop).). Seems loop_version is > more flexible for loop split.
Hmm, sure - loop_version does not need to do anything special with exits since control flow either enters the original or the loop copy. But slpeel_tree_duplicate_loop_to_edge_cfg intends to create control flow that enters _both_ loops, so it needs to have an edge from the first loop exit to the second loop entry. One could extend this to a region copy, copying eventual CFG merges of exits and specifying which exit of a SEME region is supposed to be connected to the original region entry. After all that's what loop splitting needs in the end - though not sure what exactly it does with more than one exit. So there's another set of "loop" copying, gimple_duplicate_sese_region, which doesn't actually require a single exit but a single "important" exit. That might match how you treat multiple exits. Richard.