On Tue, 26 Oct 2021, Xionghu Luo wrote: > > > On 2021/10/21 18:55, Richard Biener wrote: > > On Thu, 21 Oct 2021, Xionghu Luo wrote: > > > >> > >> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote: > >>> > >>> > >>> On 2021/9/23 20:17, Richard Biener wrote: > >>>> On Wed, 22 Sep 2021, Xionghu Luo wrote: > >>>> > >>>>> > >>>>> > >>>>> On 2021/8/11 17:16, Richard Biener wrote: > >>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote: > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 2021/8/10 22:47, Richard Biener wrote: > >>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote: > >>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote: > >>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote: > >>>>>>>>>> > >>>>>>>>>>> loop split condition is moved between loop1 and loop2, the split > >>>>>>>>>>> bb's > >>>>>>>>>>> count and probability should also be duplicated instead of (100% > >>>>>>>>>>> vs > >>>>>>>>>>> INV), > >>>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional > >>>>>>>>>>> from > >>>>>>>>>>> the > >>>>>>>>>>> original loop. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit > >>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit: > >>>>>>>>>>> ... > >>>>>>>>>>> int prephitmp_16; > >>>>>>>>>>> int prephitmp_25; > >>>>>>>>>>> > >>>>>>>>>>> <bb 2> [local count: 118111600]: > >>>>>>>>>>> if (n_7(D) > 0) > >>>>>>>>>>> goto <bb 4>; [89.00%] > >>>>>>>>>>> else > >>>>>>>>>>> goto <bb 3>; [11.00%] > >>>>>>>>>>> > >>>>>>>>>>> <bb 3> [local count: 118111600]: > >>>>>>>>>>> return; > >>>>>>>>>>> > >>>>>>>>>>> <bb 4> [local count: 105119324]: > >>>>>>>>>>> pretmp_3 = ga; > >>>>>>>>>>> > >>>>>>>>>>> - <bb 5> [local count: 955630225]: > >>>>>>>>>>> + <bb 5> [local count: 315357973]: > >>>>>>>>>>> # i_13 = PHI <i_10(20), 0(4)> > >>>>>>>>>>> # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)> > >>>>>>>>>>> if (prephitmp_12 != 0) > >>>>>>>>>>> goto <bb 6>; [33.00%] > >>>>>>>>>>> else > >>>>>>>>>>> goto <bb 7>; [67.00%] > >>>>>>>>>>> > >>>>>>>>>>> - <bb 6> [local count: 315357972]: > >>>>>>>>>>> + <bb 6> [local count: 104068130]: > >>>>>>>>>>> _2 = do_something (); > >>>>>>>>>>> ga = _2; > >>>>>>>>>>> > >>>>>>>>>>> - <bb 7> [local count: 955630225]: > >>>>>>>>>>> + <bb 7> [local count: 315357973]: > >>>>>>>>>>> # prephitmp_5 = PHI <prephitmp_12(5), _2(6)> > >>>>>>>>>>> i_10 = inc (i_13); > >>>>>>>>>>> if (n_7(D) > i_10) > >>>>>>>>>>> goto <bb 21>; [89.00%] > >>>>>>>>>>> else > >>>>>>>>>>> goto <bb 11>; [11.00%] > >>>>>>>>>>> > >>>>>>>>>>> <bb 11> [local count: 105119324]: > >>>>>>>>>>> goto <bb 3>; [100.00%] > >>>>>>>>>>> > >>>>>>>>>>> - <bb 21> [local count: 850510901]: > >>>>>>>>>>> + <bb 21> [local count: 280668596]: > >>>>>>>>>>> if (prephitmp_12 != 0) > >>>>>>>>>>> - goto <bb 20>; [100.00%] > >>>>>>>>>>> + goto <bb 20>; [33.00%] > >>>>>>>>>>> else > >>>>>>>>>>> - goto <bb 19>; [INV] > >>>>>>>>>>> + goto <bb 19>; [67.00%] > >>>>>>>>>>> > >>>>>>>>>>> - <bb 20> [local count: 850510901]: > >>>>>>>>>>> + <bb 20> [local count: 280668596]: > >>>>>>>>>>> goto <bb 5>; [100.00%] > >>>>>>>>>>> > >>>>>>>>>>> - <bb 19> [count: 0]: > >>>>>>>>>>> + <bb 19> [local count: 70429947]: > >>>>>>>>>>> # i_23 = PHI <i_10(21)> > >>>>>>>>>>> # prephitmp_25 = PHI <prephitmp_5(21)> > >>>>>>>>>>> > >>>>>>>>>>> - <bb 12> [local count: 955630225]: > >>>>>>>>>>> + <bb 12> [local count: 640272252]: > >>>>>>>>>>> # i_15 = PHI <i_23(19), i_22(16)> > >>>>>>>>>>> # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)> > >>>>>>>>>>> i_22 = inc (i_15); > >>>>>>>>>>> if (n_7(D) > i_22) > >>>>>>>>>>> goto <bb 16>; [89.00%] > >>>>>>>>>>> else > >>>>>>>>>>> goto <bb 11>; [11.00%] > >>>>>>>>>>> > >>>>>>>>>>> - <bb 16> [local count: 850510901]: > >>>>>>>>>>> + <bb 16> [local count: 569842305]: > >>>>>>>>>>> goto <bb 12>; [100.00%] > >>>>>>>>>>> > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> gcc/ChangeLog: > >>>>>>>>>>> > >>>>>>>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > >>>>>>>>>>> (do_split_loop_on_cond): Likewise. > >>>>>>>>>>> --- > >>>>>>>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- > >>>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > >>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644 > >>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c > >>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c > >>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > >>>>>>>>>>> basic_block cond_bb; > >>>>>>>>> > >>>>>>>>> if (!initial_true) > >>>>>>>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>>>>>>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>>>>>>>> + > >>>>>>>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > >>>>>>>>> + ? EDGE_SUCC (bbs[i], 0) > >>>>>>>>> + : EDGE_SUCC (bbs[i], 1); > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, > >>>>>>>>>>> - profile_probability::always > >>>>>>>>>>> (), > >>>>>>>>>>> - profile_probability::always > >>>>>>>>>>> (), > >>>>>>>>>>> - profile_probability::always > >>>>>>>>>>> (), > >>>>>>>>>>> - profile_probability::always > >>>>>>>>>>> (), > >>>>>>>>>>> + true_edge->probability, > >>>>>>>>>>> + > >>>>>>>>>>> true_edge->probability.invert (), > >>>>>>>>>>> + true_edge->probability, > >>>>>>>>>>> + > >>>>>>>>>>> true_edge->probability.invert (), > >>>>>>>>>>> true); > >>>>>>>>>> > >>>>>>>>>> there is no 'true_edge' variable at this point. > >>>>>>>>> > >>>>>>>>> Sorry, missed the above hunk when split the patch. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> gcc_assert (loop2); > >>>>>>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop > >>>>>>>>>>> *loop1, > >>>>>>>>>>> edge invar_branch) > >>>>>>>>>>> initialize_original_copy_tables (); > >>>>>>>>>>> > >>>>>>>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node, > >>>>>>>>>>> NULL, > >>>>>>>>>>> - profile_probability::always (), > >>>>>>>>>>> - profile_probability::never (), > >>>>>>>>>>> - profile_probability::always (), > >>>>>>>>>>> - profile_probability::always (), > >>>>>>>>>>> + invar_branch->probability.invert > >>>>>>>>>>> (), > >>>>>>>>>>> + invar_branch->probability, > >>>>>>>>>>> + invar_branch->probability.invert > >>>>>>>>>>> (), > >>>>>>>>>>> + invar_branch->probability, > >>>>>>>>>>> true); > >>>>>>>>>>> if (!loop2) > >>>>>>>>>>> { > >>>>>>>>>> > >>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond > >>>>>>>>>> only. > >>>>>>>>> > >>>>>>>>> split_loop faces similar issue though it sets the two branches to > >>>>>>>>> 100% vs > >>>>>>>>> 100% > >>>>>>>>> and no scaling which seems also incorrect. > >>>>>>>>> > >>>>>>>>>> Since loop versioning inserts a condition with the passed > >>>>>>>>>> probabilities > >>>>>>>>>> but in this case a 'boolean_true_node' condition the then and else > >>>>>>>>>> probabilities passed look correct. It's just the scaling arguments > >>>>>>>>>> that look wrong? This loop_version call should get a comment as to > >>>>>>>>>> why we are passing probabilities the way we do. > >>>>>>>>> > >>>>>>>>> This optimization is transforming from: > >>>>>>>>> > >>>>>>>>> for (i = 0; i < n; i = inc (i)) > >>>>>>>>> { > >>>>>>>>> if (ga) > >>>>>>>>> ga = do_something (); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> to: > >>>>>>>>> > >>>>>>>>> for (i = 0; i < x; i = inc (i)) > >>>>>>>>> { > >>>>>>>>> if (true) > >>>>>>>>> ga = do_something (); > >>>>>>>>> if (!ga) > >>>>>>>>> break; > >>>>>>>>> } > >>>>>>>>> for (; i < n; i = inc (i)) > >>>>>>>>> { > >>>>>>>>> if (false) > >>>>>>>>> ga = do_something (); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> `boolean_true_node` is passed in to make the first loop's condition > >>>>>>>>> statement to be 'true', after returning from loop_version, there is > >>>>>>>>> a > >>>>>>>>> piece of code forcing the condition in second loop to be false, > >>>>>>>>> and the original condition is moved from *in loop* to *exit edge* > >>>>>>>>> between loop1 and loop2. > >>>>>>>> > >>>>>>>> Yes, one complication is that we use loop_version but do not retain > >>>>>>>> the CFG it creates. Something like the vectorizers > >>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" > >>>>>>>> but then that code doesn't do any scaling yet. But then > >>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I > >>>>>>>> suppose > >>>>>>>> we could write a variant that simply doesn't mangle the CFG > >>>>>>>> with a new condition switching between both loops but keeps them > >>>>>>>> executed after each other ... > >>>>>>>> > >>>>>>>> As said, this adds to the confusion and some awkwardness. > >>>>>>> > >>>>>>> Then loop_version in loop split requires two types of variant, one > >>>>>>> is to insert condition to loop preheader for 'split_loops' usage, > >>>>>>> another is to insert condition to loop exit for > >>>>>>> 'do_split_loop_on_cond' > >>>>>>> usage, it needs one extra function to encapsulate these cfg > >>>>>>> alterations > >>>>>>> from outside to inside. > >>>>>>> > >>>>>>> unswitching only execute one loop as it only moves the invariant > >>>>>>> condition > >>>>>>> to first loop's pre-header. While 'split_loops' and > >>>>>>> 'do_split_loop_on_cond' > >>>>>>> may execute both loops no matter the condition is moved to the first > >>>>>>> loop's > >>>>>>> preheader or exit. > >>>>>>> > >>>>>>> The condition stmt in loop unswitching is invariant, but it is variant > >>>>>>> in loop splitting, that's why loop unswitching execute only one loop > >>>>>>> but loop splitting executes both loops. > >>>>>>> > >>>>>>> Seems we need two condition arguments for loop_version, one for > >>>>>>> connecting > >>>>>>> loop1 preheader to loop2 preheader, another one for connecting > >>>>>>> loop1's exit > >>>>>>> to loop2's header? Then it will be more generic for both unswitching > >>>>>>> pass > >>>>>>> and splitting pass. Looks a bit complicated and conflicted with > >>>>>>> loop_version's > >>>>>>> comments: > >>>>>>> > >>>>>>> /* Main entry point for Loop Versioning transformation. > >>>>>>> > >>>>>>> This transformation given a condition and a loop, creates > >>>>>>> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */ > >>>>>>> > >>>>>>> > >>>>>>> And this only works for loop split usage, those many other places > >>>>>>> doesn't use loop_version like this? > >>>>>> > >>>>>> Yes, as said if you don't want the above CFG then you probably > >>>>>> shouldn't use loop_version but instead use its building blocks > >>>>>> (and some refactoring of loop_version can make that easier). > >>>>>> > >>>>>> I think splitting wants > >>>>>> > >>>>>> loop_copy1 > >>>>>> if (condition) > >>>>>> loop_copy2 > >>>>>> > >>>>>> IMHO it would be good to split 'loopify' into the actual loopification > >>>>>> and the scaling. Basically make the part of loop_version that > >>>>>> copies the loop on the header edge and creates a loop structure for > >>>>>> it separate. > >>>>>> > >>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg > >>>>>> (copy_loop_before). > >>>>>> > >>>>> > >>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports > >>>>> copying loops with single exit, it would cause many ICEs in it even > >>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit > >>>>> is NULL returning from single_exit (loop).). Seems loop_version is > >>>>> more flexible for loop split. > >>>> > >>>> Hmm, sure - loop_version does not need to do anything special with > >>>> exits since control flow either enters the original or the loop copy. > >>>> > >>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create > >>>> control flow that enters _both_ loops, so it needs to have > >>>> an edge from the first loop exit to the second loop entry. > >>>> > >>>> One could extend this to a region copy, copying eventual CFG merges > >>>> of exits and specifying which exit of a SEME region is supposed > >>>> to be connected to the original region entry. > >>>> > >>>> After all that's what loop splitting needs in the end - though not > >>>> sure what exactly it does with more than one exit. > >>> > >>> In tree-ssa-loop-split.c, split_loop only accepts single exit loop, > >>> the recently added split_loop_on_cond could process multiple exits loop. > >>> > >>> For example, do some changes to the loop-cond-split-1.c, > >>> > >>> int ga; > >>> extern int a; > >>> extern int b; > >>> extern int c; > >>> > >>> void test1 (int n) { > >>> int i; > >>> for (i = 0; i < n; i = inc (i)). { > >>> if (a+3 > 0) > >>> break; > >>> > >>> if (ga) > >>> ga = do_something (); > >>> > >>> for (int j = 0; j < n; j++) > >>> { > >>> b+=5; > >>> if (b > c) break; > >>> } > >>> } > >>> } > >>> > >>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. > >>> I am not sure whether it is valuable to do semi-invariant loop split to > >>> such > >>> cases with multiple exits, but obviously the split_loop_on_cond is a > >>> special > >>> case from split_loop both duplicate loop to > >>> if (condition1) {loop_copy1} if (condition2) {loop_copy2} > >>> The only difference is condition1 is true for split_loop_on_cond. > >>> > >>>> > >>>> So there's another set of "loop" copying, gimple_duplicate_sese_region, > >>>> which doesn't actually require a single exit but a single "important" > >>>> exit. That might match how you treat multiple exits. > >>> > >>> gimple_duplicate_sese_region doesn't handle subloops and latches. > >>> Finally, > >>> I found that loop_version is still much better > >>> than slpeel_tree_duplicate_loop_to_edge_cfg and > >>> gimple_duplicate_sese_region > >>> since it could handle all cases like multiple exits/subloops, etc. I did > >>> some > >>> refactor to the code to introduce loop_version2 to create duplicated loops > >>> with two input conditions as attached patch, is this reasonable enough? > >>> > >>> I also tried to copy the code in loop_version out of it to don't call > >>> loop_version > >>> in loop_version2, but it seems useless with many duplicate code and NOT > >>> get rid > >>> of creating "if (condition1) {loop_copy1}" at first? > >>> > >>> > >> > >> The previous attached patch > >> 0001-Add-loop_version2-to-support-loop-transformation-wit.patch > >> had issues when connecting two loops, reason is split_loop connect_loops > >> at *exit* edge, > >> do_split_loop_on_cond connect_loops at latch_edge. So they have different > >> CFGs even > >> both with two conditions :(. It seems not so that useful to also define > >> another new function > >> 'connect_loops_at_latch' given the limited usage in loop split only? And > >> the loop_version2 > >> also looks not so general again ... > > > > All I wanted to say is that none of the current high-level APIs we have > > are a 1:1 match for loop splitting and that they in fact might do > > stuff that's unnecessary or counter-productive. If inventing a new API > > doesn't sound appealing then maybe refactoring an existing > > (maybe loop_version) to expose re-usable pieces would make more sense. > > > > For example loop_version currently does lv_adjust_loop_entry_edge > > before it loopifys the copy inserted on the header. I wonder if > > the condition generation can be done later and thus we can > > have three pieces - 1) duplicating the loop on the entry edge, > > 2) adjusting the CFG to insert a condition branching to either loop, > > 3) from loopify extract the scale_loop_frequencies bits (in fact > > loopify is only used from within cfgloopmanip.c) > > > > That said, it shouldn't be a requirement to do all this to fix the > > profile for loop splitting it's just that the current situation > > does not help understanding of how the adjustment works. > > The attached patch 0002-Refactor-loop_version.patch is to refactor > loop_version > according to your above comments. > > loopify is moved before condition generation, scale_loop_frequencies is > extracted out of loopify.
I think that's a good improvement (and if loopify is unused after the patch it should be removed together with it). The loop copying part could then be a new clone_loop_to_header_edge function, or loop_copy (rather than loop_version). The existing duplicate_loop_to_header_edge function is named badly since it does duplicate_loop_body_to_header_edge ... > 0001-Fix-loop-split-incorrect-count-and-probability.patch is still the initial > version that fixes the probability and frequency issues in loop split, just a > repeat, both passed regression test on P8LE, OK for master? That's still the original patch, I don't see any of the comments addressed and I do not have enough knowledge to approve the patch. Sorry here, but I really hoped that Honza would eventually chime in. Richard.