Gentle ping, thanks. [PATCH v3] Fix loop split incorrect count and probability
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583626.html On 2021/11/8 14:09, Xionghu Luo via Gcc-patches wrote: > > > On 2021/10/27 15:44, Jan Hubicka wrote: >>> On Wed, 27 Oct 2021, Jan Hubicka wrote: >>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. >>>>> (do_split_loop_on_cond): Likewise. >>>>> --- >>>>> gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++--------- >>>>> 1 file changed, 16 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >>>>> index 3f6ad046623..d30782888f3 100644 >>>>> --- a/gcc/tree-ssa-loop-split.c >>>>> +++ b/gcc/tree-ssa-loop-split.c >>>>> @@ -575,7 +575,11 @@ split_loop (class loop *loop1) >>>>> stmts2); >>>>> tree cond = build2 (guard_code, boolean_type_node, guard_init, border); >>>>> if (!initial_true) >>>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>>> + >>>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE >>>>> + ? EDGE_SUCC (bbs[i], 0) >>>>> + : EDGE_SUCC (bbs[i], 1); >>>>> >>>>> /* Now version the loop, placing loop2 after loop1 connecting >>>>> them, and fix up SSA form for that. */ >>>>> @@ -583,10 +587,10 @@ split_loop (class loop *loop1) >>>>> basic_block cond_bb; >>>>> >>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, >>>>> - profile_probability::always (), >>>>> - profile_probability::always (), >>>>> - profile_probability::always (), >>>>> - profile_probability::always (), >>>>> + true_edge->probability, >>>>> + true_edge->probability.invert (), >>>>> + true_edge->probability, >>>>> + true_edge->probability.invert (), >>>>> true); >>>> >>>> As discussed yesterday, for loop of form >>>> >>>> for (...) >>>> if (cond) >>>> cond = something(); >>>> else >>>> something2 >>>> >>>> Split as >>> >>> Note that you are missing to conditionalize loop1 execution >>> on 'cond' (not sure if that makes a difference). >> You are right - forgot to mention that. >> >> Entry conditional makes no difference on scaling stmts inside loop but >> affects its header and expected trip count. We however need to set up >> probability of this conditional (and preheader count if it exists) >> There is no general way to read the probability of this initial >> conditional from cfg profile. So I guess we are stuck with guessing >> some arbitrary value. I guess common case is that cond is true first >> iteration tough and often we can easily see that fromo PHI node >> initializing the test variable. >> >> Other thing that changes is expected number of iterations of the split >> loops, so we may want to update the exit conditinal probability >> accordingly... >> > Sorry for the late reply. The below updated patch mainly solves the issues > you pointed out: > - profile count proportion for both original loop and copied loop > without dropping down the true branch's count; > - probability update in the two loops and between the two loops; > - number of iterations update/check for split_loop. > > > [PATCH v3] Fix loop split incorrect count and probability > > In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two > kind of split. split_loop only works for single loop and insert edge at > exit when split, while split_loop_on_cond is not limited to single loop > and insert edge at latch when split. Both split behavior should consider > loop count and probability update. For split_loop, loop split condition > is moved in front of loop1 and loop2; But split_loop_on_cond moves the > condition between loop1 and loop2, this patch does: > 1) profile count proportion for both original loop and copied loop > without dropping down the true branch's count; > 2) probability update in and between the two loops; > 3) number of iterations update for split_loop. > > Regression tested pass, OK for master? > > Changes diff for split_loop and split_loop_on_cond cases: > > 1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit > ... > <bb 2> [local count: 118111600]: > if (beg_5(D) < end_8(D)) > goto <bb 14>; [89.00%] > else > goto <bb 6>; [11.00%] > > <bb 14> [local count: 105119324]: > if (beg2_6(D) < c_9(D)) > - goto <bb 15>; [100.00%] > + goto <bb 15>; [33.00%] > else > - goto <bb 16>; [100.00%] > + goto <bb 16>; [67.00%] > > - <bb 15> [local count: 105119324]: > + <bb 15> [local count: 34689377]: > _25 = beg_5(D) + 1; > _26 = end_8(D) - beg_5(D); > _27 = beg2_6(D) + _26; > _28 = MIN_EXPR <c_9(D), _27>; > > - <bb 3> [local count: 955630225]: > + <bb 3> [local count: 315357973]: > # i_16 = PHI <i_11(8), beg_5(D)(15)> > # j_17 = PHI <j_12(8), beg2_6(D)(15)> > printf ("a: %d %d\n", i_16, j_17); > i_11 = i_16 + 1; > j_12 = j_17 + 1; > if (j_12 < _28) > - goto <bb 8>; [89.00%] > + goto <bb 8>; [29.37%] > else > - goto <bb 17>; [11.00%] > + goto <bb 17>; [70.63%] > > - <bb 8> [local count: 850510901]: > + <bb 8> [local count: 280668596]: > goto <bb 3>; [100.00%] > > - <bb 16> [local count: 105119324]: > + <bb 16> [local count: 70429947]: > # i_22 = PHI <beg_5(D)(14), i_29(17)> > # j_23 = PHI <beg2_6(D)(14), j_30(17)> > > <bb 10> [local count: 955630225]: > # i_2 = PHI <i_22(16), i_20(13)> > # j_1 = PHI <j_23(16), j_21(13)> > i_20 = i_2 + 1; > j_21 = j_1 + 1; > if (end_8(D) > i_20) > - goto <bb 13>; [89.00%] > + goto <bb 13>; [59.63%] > else > - goto <bb 9>; [11.00%] > + goto <bb 9>; [40.37%] > > - <bb 13> [local count: 850510901]: > + <bb 13> [local count: 569842305]: > goto <bb 10>; [100.00%] > > <bb 17> [local count: 105119324]: > # i_29 = PHI <i_11(3)> > # j_30 = PHI <j_12(3)> > if (end_8(D) > i_29) > goto <bb 16>; [80.00%] > else > goto <bb 9>; [20.00%] > > <bb 9> [local count: 105119324]: > > <bb 6> [local count: 118111600]: > return 0; > > } > > 2) diff base/loop-cond-split-1.c.151t.lsplit > patched/loop-cond-split-1.c.151t.lsplit: > ... > <bb 2> [local count: 118111600]: > if (n_7(D) > 0) > goto <bb 4>; [89.00%] > else > goto <bb 3>; [11.00%] > > <bb 3> [local count: 118111600]: > return; > > <bb 4> [local count: 105119324]: > pretmp_3 = ga; > > - <bb 5> [local count: 955630225]: > + <bb 5> [local count: 315357973]: > # i_13 = PHI <i_10(20), 0(4)> > # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)> > if (prephitmp_12 != 0) > goto <bb 6>; [33.00%] > else > goto <bb 7>; [67.00%] > > <bb 6> [local count: 315357972]: > _2 = do_something (); > ga = _2; > > - <bb 7> [local count: 955630225]: > + <bb 7> [local count: 315357973]: > # prephitmp_5 = PHI <prephitmp_12(5), _2(6)> > i_10 = inc (i_13); > if (n_7(D) > i_10) > goto <bb 21>; [89.00%] > else > goto <bb 11>; [11.00%] > > <bb 11> [local count: 105119324]: > goto <bb 3>; [100.00%] > > - <bb 21> [local count: 850510901]: > + <bb 21> [local count: 280668596]: > if (prephitmp_12 != 0) > - goto <bb 20>; [100.00%] > + goto <bb 20>; [33.00%] > else > - goto <bb 19>; [INV] > + goto <bb 19>; [67.00%] > > - <bb 20> [local count: 850510901]: > + <bb 20> [local count: 280668596]: > goto <bb 5>; [100.00%] > > - <bb 19> [count: 0]: > + <bb 19> [local count: 70429947]: > # i_23 = PHI <i_10(21)> > # prephitmp_25 = PHI <prephitmp_5(21)> > > - <bb 12> [local count: 955630225]: > + <bb 12> [local count: 640272252]: > # i_15 = PHI <i_23(19), i_22(16)> > # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)> > i_22 = inc (i_15); > if (n_7(D) > i_22) > - goto <bb 16>; [89.00%] > + goto <bb 16>; [59.63%] > else > - goto <bb 11>; [11.00%] > + goto <bb 11>; [40.37%] > > - <bb 16> [local count: 850510901]: > + <bb 16> [local count: 569842305]: > goto <bb 12>; [100.00%] > > } > > gcc/ChangeLog: > > * tree-ssa-loop-split.c (split_loop): Fix incorrect > profile_count and probability. > (do_split_loop_on_cond): Likewise. > --- > gcc/tree-ssa-loop-split.c | 110 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 102 insertions(+), 8 deletions(-) > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > index 3f6ad046623..102766241fb 100644 > --- a/gcc/tree-ssa-loop-split.c > +++ b/gcc/tree-ssa-loop-split.c > @@ -575,7 +575,10 @@ split_loop (class loop *loop1) > stmts2); > tree cond = build2 (guard_code, boolean_type_node, guard_init, border); > if (!initial_true) > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > + > + edge true_edge, false_edge; > + extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge); > > /* Now version the loop, placing loop2 after loop1 connecting > them, and fix up SSA form for that. */ > @@ -583,11 +586,11 @@ split_loop (class loop *loop1) > basic_block cond_bb; > > class loop *loop2 = loop_version (loop1, cond, &cond_bb, > - profile_probability::always (), > - profile_probability::always (), > - profile_probability::always (), > - profile_probability::always (), > - true); > + true_edge->probability, > + true_edge->probability.invert (), > + profile_probability::always (), > + profile_probability::always (), > + true); > gcc_assert (loop2); > > edge new_e = connect_loops (loop1, loop2); > @@ -607,6 +610,53 @@ split_loop (class loop *loop1) > tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1)); > patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true); > > + /* Check first loop's number of iterations. */ > + update_ssa (TODO_update_ssa); > + gcc_assert (number_of_iterations_exit (loop1, single_exit (loop1), > + &niter, false, true)); > + > + /* Proportion first loop's bb counts except those dominated by true > + branch to avoid drop 1s down. */ > + basic_block *bbs1, *bbs2; > + bbs1 = get_loop_body (loop1); > + unsigned j; > + for (j = 0; j < loop1->num_nodes; j++) > + if (bbs1[j] == loop1->latch > + || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest)) > + bbs1[j]->count > + = bbs1[j]->count.apply_probability (true_edge->probability); > + free (bbs1); > + > + /* Fix first loop's exit probability after scaling. */ > + edge exit_to_latch1 = single_pred_edge (loop1->latch); > + exit_to_latch1->probability = exit_to_latch1->probability.apply_scale ( > + true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE); > + single_exit (loop1)->probability > + = exit_to_latch1->probability.invert (); > + > + /* Check second loop's number of iterations. */ > + class tree_niter_desc niter2; > + gcc_assert (number_of_iterations_exit (loop2, single_exit (loop2), > + &niter2, false, true)); > + > + /* Proportion second loop's bb counts except those dominated by false > + branch to avoid drop 1s down. */ > + basic_block bbi_copy = get_bb_copy (false_edge->dest); > + bbs2 = get_loop_body (loop2); > + for (j = 0; j < loop2->num_nodes; j++) > + if (bbs2[j] == loop2->latch > + || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy)) > + bbs2[j]->count = bbs2[j]->count.apply_probability ( > + true_edge->probability.invert ()); > + free (bbs2); > + > + /* Fix second loop's exit probability after scaling. */ > + edge exit_to_latch2 = single_pred_edge (loop2->latch); > + exit_to_latch2->probability = exit_to_latch2->probability.apply_scale ( > + false_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE); > + single_exit (loop2)->probability > + = exit_to_latch2->probability.invert (); > + > /* Finally patch out the two copies of the condition to be always > true/false (or opposite). */ > gcond *force_true = as_a<gcond *> (last_stmt (bbs[i])); > @@ -1486,8 +1536,8 @@ do_split_loop_on_cond (struct loop *loop1, edge > invar_branch) > initialize_original_copy_tables (); > > struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, > - profile_probability::always (), > - profile_probability::never (), > + invar_branch->probability.invert (), > + invar_branch->probability, > profile_probability::always (), > profile_probability::always (), > true); > @@ -1535,6 +1585,50 @@ do_split_loop_on_cond (struct loop *loop1, edge > invar_branch) > between loop1 and loop2. */ > connect_loop_phis (loop1, loop2, to_loop2); > > + update_ssa (TODO_update_ssa); > + > + edge true_edge, false_edge, skip_edge1, skip_edge2; > + extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge); > + > + /* Proportion first loop's bb counts except those dominated by true > + branch to avoid drop 1s down. */ > + skip_edge1 = true_invar ? false_edge : true_edge; > + skip_edge2 = true_invar ? true_edge : false_edge; > + basic_block *bbs1, *bbs2; > + bbs1 = get_loop_body (loop1); > + unsigned j; > + for (j = 0; j < loop1->num_nodes; j++) > + if (bbs1[j] == loop1->latch > + || !dominated_by_p (CDI_DOMINATORS, bbs1[j], skip_edge1->dest)) > + bbs1[j]->count > + = bbs1[j]->count.apply_probability (skip_edge1->probability); > + free (bbs1); > + > + /* Fix first loop's exit probability after scaling. */ > + to_loop1->probability = invar_branch->probability.invert (); > + to_loop2->probability = invar_branch->probability; > + > + /* Proportion second loop's bb counts except those dominated by false > + branch to avoid drop 1s down. */ > + basic_block bbi_copy = get_bb_copy (skip_edge2->dest); > + bbs2 = get_loop_body (loop2); > + for (j = 0; j < loop2->num_nodes; j++) > + if (bbs2[j] == loop2->latch > + || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy)) > + bbs2[j]->count > + = bbs2[j]->count.apply_probability (skip_edge2->probability); > + free (bbs2); > + > + /* Fix second loop's exit probability after scaling. */ > + edge loop2_latch_exit; > + edge exit_to_latch2 = single_pred_edge (loop2->latch); > + exit_to_latch2->probability = exit_to_latch2->probability.apply_scale ( > + skip_edge2->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE); > + loop2_latch_exit = EDGE_SUCC (exit_to_latch2->src, 0) == exit_to_latch2 > + ? EDGE_SUCC (exit_to_latch2->src, 1) > + : EDGE_SUCC (exit_to_latch2->src, 0); > + loop2_latch_exit->probability = exit_to_latch2->probability.invert (); > + > free_original_copy_tables (); > > return true; > -- Thanks, Xionghu