Any transformation involving cfg alteration would face same problem, it is not that easy to update new cfg with reasonable and seemly-correct profile count. We can adjust probability for impacted condition bbs, but lack of a utility like what static profile estimating pass does, and only propagates count partially.
Thanks, Feng ________________________________________ From: Richard Biener <rguent...@suse.de> Sent: Tuesday, August 10, 2021 10:47 PM To: Xionghu Luo Cc: gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; Feng Xue OS; wschm...@linux.ibm.com; guoji...@linux.ibm.com; li...@gcc.gnu.org; hubi...@ucw.cz Subject: Re: [PATCH] Fix loop split incorrect count and probability On Mon, 9 Aug 2021, Xionghu Luo wrote: > Thanks, > > On 2021/8/6 19:46, Richard Biener wrote: > > On Tue, 3 Aug 2021, Xionghu Luo wrote: > > > >> loop split condition is moved between loop1 and loop2, the split bb's > >> count and probability should also be duplicated instead of (100% vs INV), > >> secondly, the original loop1 and loop2 count need be propotional from the > >> original loop. > >> > >> > >> diff base/loop-cond-split-1.c.151t.lsplit > >> patched/loop-cond-split-1.c.151t.lsplit: > >> ... > >> int prephitmp_16; > >> int prephitmp_25; > >> > >> <bb 2> [local count: 118111600]: > >> if (n_7(D) > 0) > >> goto <bb 4>; [89.00%] > >> else > >> goto <bb 3>; [11.00%] > >> > >> <bb 3> [local count: 118111600]: > >> return; > >> > >> <bb 4> [local count: 105119324]: > >> pretmp_3 = ga; > >> > >> - <bb 5> [local count: 955630225]: > >> + <bb 5> [local count: 315357973]: > >> # i_13 = PHI <i_10(20), 0(4)> > >> # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)> > >> if (prephitmp_12 != 0) > >> goto <bb 6>; [33.00%] > >> else > >> goto <bb 7>; [67.00%] > >> > >> - <bb 6> [local count: 315357972]: > >> + <bb 6> [local count: 104068130]: > >> _2 = do_something (); > >> ga = _2; > >> > >> - <bb 7> [local count: 955630225]: > >> + <bb 7> [local count: 315357973]: > >> # prephitmp_5 = PHI <prephitmp_12(5), _2(6)> > >> i_10 = inc (i_13); > >> if (n_7(D) > i_10) > >> goto <bb 21>; [89.00%] > >> else > >> goto <bb 11>; [11.00%] > >> > >> <bb 11> [local count: 105119324]: > >> goto <bb 3>; [100.00%] > >> > >> - <bb 21> [local count: 850510901]: > >> + <bb 21> [local count: 280668596]: > >> if (prephitmp_12 != 0) > >> - goto <bb 20>; [100.00%] > >> + goto <bb 20>; [33.00%] > >> else > >> - goto <bb 19>; [INV] > >> + goto <bb 19>; [67.00%] > >> > >> - <bb 20> [local count: 850510901]: > >> + <bb 20> [local count: 280668596]: > >> goto <bb 5>; [100.00%] > >> > >> - <bb 19> [count: 0]: > >> + <bb 19> [local count: 70429947]: > >> # i_23 = PHI <i_10(21)> > >> # prephitmp_25 = PHI <prephitmp_5(21)> > >> > >> - <bb 12> [local count: 955630225]: > >> + <bb 12> [local count: 640272252]: > >> # i_15 = PHI <i_23(19), i_22(16)> > >> # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)> > >> i_22 = inc (i_15); > >> if (n_7(D) > i_22) > >> goto <bb 16>; [89.00%] > >> else > >> goto <bb 11>; [11.00%] > >> > >> - <bb 16> [local count: 850510901]: > >> + <bb 16> [local count: 569842305]: > >> goto <bb 12>; [100.00%] > >> > >> } > >> > >> gcc/ChangeLog: > >> > >> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > >> (do_split_loop_on_cond): Likewise. > >> --- > >> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- > >> 1 file changed, 8 insertions(+), 8 deletions(-) > >> > >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > >> index 3a09bbc39e5..8e5a7ded0f7 100644 > >> --- a/gcc/tree-ssa-loop-split.c > >> +++ b/gcc/tree-ssa-loop-split.c > >> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > >> basic_block cond_bb; > > if (!initial_true) > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > + > + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > + ? EDGE_SUCC (bbs[i], 0) > + : EDGE_SUCC (bbs[i], 1); > > >> > >> class loop *loop2 = loop_version (loop1, cond, &cond_bb, > >> - profile_probability::always (), > >> - profile_probability::always (), > >> - profile_probability::always (), > >> - profile_probability::always (), > >> + true_edge->probability, > >> + true_edge->probability.invert (), > >> + true_edge->probability, > >> + true_edge->probability.invert (), > >> true); > > > > there is no 'true_edge' variable at this point. > > Sorry, missed the above hunk when split the patch. > > > > >> gcc_assert (loop2); > >> > >> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge > >> invar_branch) > >> initialize_original_copy_tables (); > >> > >> struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, > >> - profile_probability::always (), > >> - profile_probability::never (), > >> - profile_probability::always (), > >> - profile_probability::always (), > >> + invar_branch->probability.invert (), > >> + invar_branch->probability, > >> + invar_branch->probability.invert (), > >> + invar_branch->probability, > >> true); > >> if (!loop2) > >> { > > > > The patch introduction seems to talk about do_split_loop_on_cond only. > > split_loop faces similar issue though it sets the two branches to 100% vs 100% > and no scaling which seems also incorrect. > > > Since loop versioning inserts a condition with the passed probabilities > > but in this case a 'boolean_true_node' condition the then and else > > probabilities passed look correct. It's just the scaling arguments > > that look wrong? This loop_version call should get a comment as to > > why we are passing probabilities the way we do. > > This optimization is transforming from: > > for (i = 0; i < n; i = inc (i)) > { > if (ga) > ga = do_something (); > } > > to: > > for (i = 0; i < x; i = inc (i)) > { > if (true) > ga = do_something (); > if (!ga) > break; > } > for (; i < n; i = inc (i)) > { > if (false) > ga = do_something (); > } > > > `boolean_true_node` is passed in to make the first loop's condition > statement to be 'true', after returning from loop_version, there is a > piece of code forcing the condition in second loop to be false, > and the original condition is moved from *in loop* to *exit edge* > between loop1 and loop2. Yes, one complication is that we use loop_version but do not retain the CFG it creates. Something like the vectorizers slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" but then that code doesn't do any scaling yet. But then loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose we could write a variant that simply doesn't mangle the CFG with a new condition switching between both loops but keeps them executed after each other ... As said, this adds to the confusion and some awkwardness. > @fxue may have inputs about this since he contributed it years ago. > > > > > It does seem that scaling the loop by the invar_branch probability > > is correct. Since this does similar things to unswitching, I see > > that unswitching does > > > > prob_true = edge_true->probability; > > loop_version (loop, unshare_expr (cond), > > NULL, prob_true, > > prob_true.invert (), > > prob_true, prob_true.invert (), > > false); > > > > which maybe suggests that your invar_branch based passing should > > depend on 'true_invar'? > > > > Also compared to unswitching the first loop is always entered, > > so I wonder if the scaling is correct with respect to that > > given unswitching where only ever one loop is entered? > > > Scaling is based on the probability calculated on "if (ga)", if it is > 33% vs 67% before split, then it is reasonable to scale loop1 to 33% > and loop2 to 67% suppose the branch probability is correct enough? > > unswitch also scaled the two loops based on prob_true, if prob_true > is 50%, it decreases X's count to HALF unexpectedly since it should be > executed on both branches? while loop split still kept execute both first > loop and second loop, it seems even more accurate than loop unswitching? I just was saying that both doing exactly the same looks wrong (on either side). > tree-ssa-loop-unswitch.c: > > while (A) > { > if (inv) > B; > > X; > > if (!inv) > C; > } > > where inv is the loop invariant, into > > if (inv) > { > while (A) > { > B; > X; > } > } > else > { > while (A) > { > X; > C; > } > } Yes, here scaling based on the if (inv) probability looks obviously 100% correct to me. But I might be wrong. And thus the splitting case must be still off (so I conclude). Hmm, in fact I think for the loop unswitching case the scaling of the B and C blocks is off? Those should be considered always executed. Note the loop unswitching pass is altering the conditions condition to static true/false but I don't think it performs further adjustments. That said, likely the profile update cannot be done uniformly for all blocks of a loop? Richard.