On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > this patch fixes profile update in the first case of loop splitting. > The pass still gives up on very basic testcases: > > __attribute__ ((noinline,noipa)) > void test1 (int n) > { > if (n <= 0 || n > 100000) > return; > for (int i = 0; i <= n; i++) > { > if (i < n) > do_something (); > if (a[i]) > do_something2(); > } > } > Here I needed to do the conditoinal that enforces sane value range of n. > The reason is that it gives up on: > !number_of_iterations_exit (loop1, exit1, &niter, false, true) > and without the conditonal we get assumption that n>=0 and not INT_MAX. > I think from overflow we shold derive that INT_MAX test is not needed and > since > the loop does nothing for n<0 it is also just an paranoia.
I only get n != 2147483647 (loop header copying does the n >= 0). Indeed this test looks odd. It's because we turn i <= n into i < n + 1 and analyze that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX. > I am not sure how to fix this though :(. In general the pass does not really > need to compute iteration count. It only needs to know what direction the IVs > go so it can detect tests that fires in first part of iteration space. > > Rich, any idea what the correct test should be? In principle it could just look at the scalar evolution for the IV in the exit test. Aka use simple_iv () and check ->no_overflow? > In testcase: > for (int i = 0; i < 200; i++) > if (i < 150) > do_something (); > else > do_something2 (); > the old code did wrong update of the exit condition probabilities. > We know that first loop iterates 150 times and the second loop 50 times > and we get it by simply scaling loop body by the probability of inner test. > > With the patch we now get: > > <bb 2> [count: 1000]: > > <bb 3> [count: 150000]: <- loop 1 correctly iterates 149 times > # i_10 = PHI <i_7(8), 0(2)> > do_something (); > i_7 = i_10 + 1; > if (i_7 <= 149) > goto <bb 8>; [99.33%] > else > goto <bb 17>; [0.67%] > > <bb 8> [count: 149000]: > goto <bb 3>; [100.00%] > > <bb 16> [count: 1000]: > # i_15 = PHI <i_18(17)> > > <bb 9> [count: 49975]: <- loop 2 should iterate 50 times but > we are slightly wrong > # i_3 = PHI <i_15(16), i_14(13)> > do_something2 (); > i_14 = i_3 + 1; > if (i_14 != 200) > goto <bb 13>; [98.00%] > else > goto <bb 7>; [2.00%] > > <bb 13> [count: 48975]: > goto <bb 9>; [100.00%] > > <bb 17> [count: 1000]: <- this test is always true becuase it is > reached form bb 3 > # i_18 = PHI <i_7(3)> > if (i_18 != 200) > goto <bb 16>; [99.95%] > else > goto <bb 7>; [0.05%] > > <bb 7> [count: 1000]: > return; > > The reason why we are slightly wrong is the condtion in bb17 that > is always true but the pass does not konw it. > > Rich any idea how to do that? I think connect_loops should work out > the cas where the loop exit conditon is never satisfied at the time > the splitted condition fails for first time. > > Also we do not update loop iteration expectancies. If we were able to > work out if one of the loop has constant iteration count, we could do it > perfectly. > > Before patch on hmmer we get a lot of mismatches: > Profile report here claims: > dump id |static mismat|dynamic mismatch | > |in count |in count |time | > lsplit | 5 +5| 8151850567 +8151850567| 531506481006 +57.9%| > ldist | 9 +4| 15345493501 +7193642934| 606848841056 +14.2%| > ifcvt | 10 +1| 15487514871 +142021370| 689469797790 +13.6%| > vect | 35 +25| 17558425961 +2070911090| 517375405715 -25.0%| > cunroll | 42 +7| 16898736178 -659689783| 452445796198 -4.9%| > loopdone| 33 -9| 2678017188 -14220718990| 330969127663 | > tracer | 34 +1| 2678018710 +1522| 330613415364 +0.0%| > fre | 33 -1| 2676980249 -1038461| 330465677073 -0.0%| > expand | 28 -5| 2497468467 -179511782|--------------------------| > > With patch > > lsplit | 0 | 0 | 328723360744 -2.3%| > ldist | 0 | 0 | 396193562452 +20.6%| > ifcvt | 1 +1| 71010686 +71010686| 478743508522 +20.8%| > vect | 14 +13| 697518955 +626508269| 299398068323 -37.5%| > cunroll | 13 -1| 489349408 -208169547| 257777839725 -10.5%| > loopdone| 11 -2| 402558559 -86790849| 201010712702 | > tracer | 13 +2| 402977200 +418641| 200651036623 +0.0%| > fre | 13 | 402622146 -355054| 200344398654 -0.2%| > expand | 11 -2| 333608636 -69013510|--------------------------| > > So no mismatches for lsplit and ldist and also lsplit thinks it improves > speed by 2.3% rather than regressig it by 57%. > > Update is still not perfect since we do not work out that the second loop > never iterates. Also ldist is still wrong siince time should not go up. > > Ifcft wrecks profile by desing since it insert conditonals with both arms 100% > that will be eliminated later after vect. It is not clear to me what happens > in vect though. > > Bootstrapped/regtested x86_64-linux, comitted. > > gcc/ChangeLog: > > PR middle-end/106293 > * tree-ssa-loop-split.cc (connect_loops): Change probability > of the test preconditioning second loop to very_likely. > (fix_loop_bb_probability): Handle correctly case where > on of the arms of the conditional is empty. > (split_loop): Fold the test guarding first condition to > see if it is constant true; Set correct entry block > probabilities of the split loops; determine correct loop > eixt probabilities. > > gcc/testsuite/ChangeLog: > > PR middle-end/106293 > * gcc.dg/tree-prof/loop-split-1.c: New test. > * gcc.dg/tree-prof/loop-split-2.c: New test. > * gcc.dg/tree-prof/loop-split-3.c: New test. > > diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c > b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c > new file mode 100644 > index 00000000000..46f7ae66fa3 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c > @@ -0,0 +1,33 @@ > +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks > -fdump-tree-optimized-details-blocks" } */ > + > + > +void > +__attribute__ ((noinline,noipa)) > +do_something() > +{ > +} > +void > +__attribute__ ((noinline,noipa)) > +do_something2() > +{ > +} > + > +__attribute__ ((noinline,noipa)) > +void test1 (int n, int ga) > +{ > + for (int i = 0; i < 200; i++) > + if (i < 150) > + do_something (); > + else > + do_something2 (); > +} > +int > +main(int, char **) > +{ > + for (int i = 0 ; i < 1000; i++) > + test1(10, 10); > + return 0; > +} > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" > } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "lsplit" } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "optimized" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c > b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c > new file mode 100644 > index 00000000000..de2c9f58301 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c > @@ -0,0 +1,36 @@ > +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks > -fdump-tree-optimized-details-blocks" } */ > + > +int M = 100; > + > +void > +__attribute__ ((noinline,noipa)) > +do_something() > +{ > +} > +void > +__attribute__ ((noinline,noipa)) > +do_something2() > +{ > +} > + > +__attribute__ ((noinline,noipa)) > +void test1 (int n) > +{ > + if (n <= 0 || n > 100000) > + return; > + for (int i = 0; i <= n; i++) > + if (i < n) > + do_something (); > + else > + do_something2 (); > +} > +int > +main(int, char **) > +{ > + for (int i = 0 ; i < 1000; i++) > + test1(M); > + return 0; > +} > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" > } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "lsplit" } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "optimized" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c > b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c > new file mode 100644 > index 00000000000..a88bc1f8663 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c > @@ -0,0 +1,41 @@ > +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks > -fdump-tree-optimized-details-blocks" } */ > + > +int M = 100; > +int a[1000]; > + > +void > +__attribute__ ((noinline,noipa)) > +do_something() > +{ > +} > +void > +__attribute__ ((noinline,noipa)) > +do_something2() > +{ > +} > + > +__attribute__ ((noinline,noipa)) > +void test1 (int n) > +{ > + if (n <= 0 || n > 100000) > + return; > + for (int i = 0; i <= n; i++) > + { > + if (i < n) > + do_something (); > + if (a[i]) > + do_something2(); > + } > +} > +int > +main(int, char **) > +{ > + for (int i = 0 ; i < 1000; i+=3) > + a[i]=1; > + for (int i = 0 ; i < 1000; i++) > + test1(M); > + return 0; > +} > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" > } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "lsplit" } } */ > +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 > "optimized" } } */ > diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc > index f441f3fe1b5..70cd0aaefa7 100644 > --- a/gcc/tree-ssa-loop-split.cc > +++ b/gcc/tree-ssa-loop-split.cc > @@ -355,7 +356,7 @@ connect_loops (class loop *loop1, class loop *loop2) > new_e->flags |= EDGE_TRUE_VALUE; > } > > - new_e->probability = profile_probability::likely (); > + new_e->probability = profile_probability::very_likely (); > skip_e->probability = new_e->probability.invert (); > > return new_e; > @@ -496,6 +497,8 @@ fix_loop_bb_probability (class loop *loop1, class loop > *loop2, edge true_edge, > unsigned j; > for (j = 0; j < loop1->num_nodes; j++) > if (bbs1[j] == loop1->latch > + /* Watch for case where the true conditional is empty. */ > + || !single_pred_p (true_edge->dest) > || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest)) > bbs1[j]->count > = bbs1[j]->count.apply_probability (true_edge->probability); > @@ -507,6 +510,8 @@ fix_loop_bb_probability (class loop *loop1, class loop > *loop2, edge true_edge, > bbs2 = get_loop_body (loop2); > for (j = 0; j < loop2->num_nodes; j++) > if (bbs2[j] == loop2->latch > + /* Watch for case where the flase conditional is empty. */ > + || !single_pred_p (bbi_copy) > || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy)) > bbs2[j]->count > = bbs2[j]->count.apply_probability (true_edge->probability.invert ()); > @@ -564,6 +569,7 @@ split_loop (class loop *loop1) > for (i = 0; i < loop1->num_nodes; i++) > if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv))) > { > + profile_count entry_count = loop_preheader_edge (loop1)->count (); > /* Handling opposite steps is not implemented yet. Neither > is handling different step sizes. */ > if ((tree_int_cst_sign_bit (iv.step) > @@ -610,7 +616,8 @@ split_loop (class loop *loop1) > if (stmts2) > gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1), > stmts2); > - tree cond = build2 (guard_code, boolean_type_node, guard_init, > border); > + tree cond = fold_build2 (guard_code, boolean_type_node, > + guard_init, border); > if (!initial_true) > cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > > @@ -622,13 +629,71 @@ split_loop (class loop *loop1) > initialize_original_copy_tables (); > basic_block cond_bb; > > + profile_probability loop1_prob > + = integer_onep (cond) ? profile_probability::always () > + : true_edge->probability; > + /* TODO: It is commonly a case that we know that both loops will be > + entered. very_likely below is the probability that second loop > will > + be entered given by connect_loops. We should work out the common > + case it is always true. */ > class loop *loop2 = loop_version (loop1, cond, &cond_bb, > - true_edge->probability, > - true_edge->probability.invert (), > + loop1_prob, > + /* Pass always as we will later > + redirect first loop to second > + loop. */ > profile_probability::always (), > profile_probability::always (), > + profile_probability::very_likely (), > true); > gcc_assert (loop2); > + /* Correct probability of edge cond_bb->preheader_of_loop2. */ > + single_pred_edge > + (loop_preheader_edge (loop2)->src)->probability > + = loop1_prob.invert (); > + > + fix_loop_bb_probability (loop1, loop2, true_edge, false_edge); > + > + /* Fix loop's exit probability after scaling. */ > + > + if (entry_count.initialized_p () && entry_count.nonzero_p ()) > + { > + edge exit_to_latch1; > + if (EDGE_SUCC (exit1->src, 0) == exit1) > + exit_to_latch1 = EDGE_SUCC (exit1->src, 1); > + else > + exit_to_latch1 = EDGE_SUCC (exit1->src, 0); > + if (exit1->src->count.nonzero_p ()) > + { > + /* First loop is entered loop1_prob * entry_count times > + and it needs to exit the same number of times. */ > + exit1->probability > + = entry_count.apply_probability > + (loop1_prob).probability_in > (exit1->src->count); > + exit_to_latch1->probability = exit1->probability.invert (); > + scale_dominated_blocks_in_loop (loop1, exit1->src, > + exit_to_latch1->count (), > + exit_to_latch1->dest->count); > + } > + > + edge exit_to_latch2, exit2 = single_exit (loop2); > + if (EDGE_SUCC (exit2->src, 0) == exit2) > + exit_to_latch2 = EDGE_SUCC (exit2->src, 1); > + else > + exit_to_latch2 = EDGE_SUCC (exit2->src, 0); > + if (exit2->src->count.nonzero_p ()) > + { > + /* Second loop is entered very_likely * entry_count times > + and it needs to exit the same number of times. */ > + exit2->probability > + = entry_count.apply_probability > + (profile_probability::very_likely ()) > + .probability_in (exit2->src->count); > + exit_to_latch2->probability = exit2->probability.invert (); > + scale_dominated_blocks_in_loop (loop2, exit2->src, > + exit_to_latch2->count (), > + exit_to_latch2->dest->count); > + } > + } > > edge new_e = connect_loops (loop1, loop2); > connect_loop_phis (loop1, loop2, new_e); > @@ -647,16 +712,7 @@ split_loop (class loop *loop1) > tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge > (loop1)); > patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true); > > - fix_loop_bb_probability (loop1, loop2, true_edge, false_edge); > - > - /* Fix first loop's exit probability after scaling. */ > - edge exit_to_latch1; > - if (EDGE_SUCC (exit1->src, 0) == exit1) > - exit_to_latch1 = EDGE_SUCC (exit1->src, 1); > - else > - exit_to_latch1 = EDGE_SUCC (exit1->src, 0); > - exit_to_latch1->probability *= true_edge->probability; > - exit1->probability = exit_to_latch1->probability.invert (); > + /* TODO: Update any_esitmate and upper bounds. */ > > /* Finally patch out the two copies of the condition to be always > true/false (or opposite). */