Re: Loop-split improvements, part 2

Richard Biener via Gcc-patches Fri, 28 Jul 2023 05:38:20 -0700

On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> this patch fixes profile update in the first case of loop splitting.
> The pass still gives up on very basic testcases:
>
> __attribute__ ((noinline,noipa))
> void test1 (int n)
> {
>   if (n <= 0 || n > 100000)
>     return;
>   for (int i = 0; i <= n; i++)
>     {
>       if (i < n)
>         do_something ();
>       if (a[i])
>         do_something2();
>     }
> }
> Here I needed to do the conditoinal that enforces sane value range of n.
> The reason is that it gives up on:
>       !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> and without the conditonal we get assumption that n>=0 and not INT_MAX.
> I think from overflow we shold derive that INT_MAX test is not needed and 
> since
> the loop does nothing for n<0 it is also just an paranoia.


I only get n != 2147483647 (loop header copying does the n >= 0).  Indeed
this test looks odd.  It's because we turn i <= n into i < n + 1 and analyze
that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.

> I am not sure how to fix this though :(.  In general the pass does not really
> need to compute iteration count.  It only needs to know what direction the IVs
> go so it can detect tests that fires in first part of iteration space.
>
> Rich, any idea what the correct test should be?

In principle it could just look at the scalar evolution for the IV in
the exit test.
Aka use simple_iv () and check ->no_overflow?

> In testcase:
>   for (int i = 0; i < 200; i++)
>     if (i < 150)
>       do_something ();
>     else
>       do_something2 ();
> the old code did wrong update of the exit condition probabilities.
> We know that first loop iterates 150 times and the second loop 50 times
> and we get it by simply scaling loop body by the probability of inner test.
>
> With the patch we now get:
>
>   <bb 2> [count: 1000]:
>
>   <bb 3> [count: 150000]:    <- loop 1 correctly iterates 149 times
>   # i_10 = PHI <i_7(8), 0(2)>
>   do_something ();
>   i_7 = i_10 + 1;
>   if (i_7 <= 149)
>     goto <bb 8>; [99.33%]
>   else
>     goto <bb 17>; [0.67%]
>
>   <bb 8> [count: 149000]:
>   goto <bb 3>; [100.00%]
>
>   <bb 16> [count: 1000]:
>   # i_15 = PHI <i_18(17)>
>
>   <bb 9> [count: 49975]:    <- loop 2 should iterate 50 times but
>                                we are slightly wrong
>   # i_3 = PHI <i_15(16), i_14(13)>
>   do_something2 ();
>   i_14 = i_3 + 1;
>   if (i_14 != 200)
>     goto <bb 13>; [98.00%]
>   else
>     goto <bb 7>; [2.00%]
>
>   <bb 13> [count: 48975]:
>   goto <bb 9>; [100.00%]
>
>   <bb 17> [count: 1000]:   <- this test is always true becuase it is
>                               reached form bb 3
>   # i_18 = PHI <i_7(3)>
>   if (i_18 != 200)
>     goto <bb 16>; [99.95%]
>   else
>     goto <bb 7>; [0.05%]
>
>   <bb 7> [count: 1000]:
>   return;
>
> The reason why we are slightly wrong is the condtion in bb17 that
> is always true but the pass does not konw it.
>
> Rich any idea how to do that?  I think connect_loops should work out
> the cas where the loop exit conditon is never satisfied at the time
> the splitted condition fails for first time.
>
> Also we do not update loop iteration expectancies.  If we were able to
> work out if one of the loop has constant iteration count, we could do it
> perfectly.
>
> Before patch on hmmer we get a lot of mismatches:
> Profile report here claims:
> dump id |static mismat|dynamic mismatch                                     |
>         |in count     |in count                  |time                      |
> lsplit  |      5    +5|   8151850567  +8151850567| 531506481006       +57.9%|
> ldist   |      9    +4|  15345493501  +7193642934| 606848841056       +14.2%|
> ifcvt   |     10    +1|  15487514871   +142021370| 689469797790       +13.6%|
> vect    |     35   +25|  17558425961  +2070911090| 517375405715       -25.0%|
> cunroll |     42    +7|  16898736178   -659689783| 452445796198        -4.9%|
> loopdone|     33    -9|   2678017188 -14220718990| 330969127663             |
> tracer  |     34    +1|   2678018710        +1522| 330613415364        +0.0%|
> fre     |     33    -1|   2676980249     -1038461| 330465677073        -0.0%|
> expand  |     28    -5|   2497468467   -179511782|--------------------------|
>
> With patch
>
> lsplit  |      0      |            0             | 328723360744        -2.3%|
> ldist   |      0      |            0             | 396193562452       +20.6%|
> ifcvt   |      1    +1|     71010686    +71010686| 478743508522       +20.8%|
> vect    |     14   +13|    697518955   +626508269| 299398068323       -37.5%|
> cunroll |     13    -1|    489349408   -208169547| 257777839725       -10.5%|
> loopdone|     11    -2|    402558559    -86790849| 201010712702             |
> tracer  |     13    +2|    402977200      +418641| 200651036623        +0.0%|
> fre     |     13      |    402622146      -355054| 200344398654        -0.2%|
> expand  |     11    -2|    333608636    -69013510|--------------------------|
>
> So no mismatches for lsplit and ldist and also lsplit thinks it improves
> speed by 2.3% rather than regressig it by 57%.
>
> Update is still not perfect since we do not work out that the second loop
> never iterates.  Also ldist is still wrong siince time should not go up.
>
> Ifcft wrecks profile by desing since it insert conditonals with both arms 100%
> that will be eliminated later after vect.  It is not clear to me what happens
> in vect though.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
>         PR middle-end/106293
>         * tree-ssa-loop-split.cc (connect_loops): Change probability
>         of the test preconditioning second loop to very_likely.
>         (fix_loop_bb_probability): Handle correctly case where
>         on of the arms of the conditional is empty.
>         (split_loop): Fold the test guarding first condition to
>         see if it is constant true; Set correct entry block
>         probabilities of the split loops; determine correct loop
>         eixt probabilities.
>
> gcc/testsuite/ChangeLog:
>
>         PR middle-end/106293
>         * gcc.dg/tree-prof/loop-split-1.c: New test.
>         * gcc.dg/tree-prof/loop-split-2.c: New test.
>         * gcc.dg/tree-prof/loop-split-3.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> new file mode 100644
> index 00000000000..46f7ae66fa3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> @@ -0,0 +1,33 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks 
> -fdump-tree-optimized-details-blocks" } */
> +
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n, int ga)
> +{
> +  for (int i = 0; i < 200; i++)
> +         if (i < 150)
> +                 do_something ();
> +         else
> +                 do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> +       for (int i = 0 ; i < 1000; i++)
> +         test1(10, 10);
> +       return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" 
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> new file mode 100644
> index 00000000000..de2c9f58301
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks 
> -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> +  if (n <= 0 || n > 100000)
> +        return;
> +  for (int i = 0; i <= n; i++)
> +          if (i < n)
> +                  do_something ();
> +          else
> +                  do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> +        for (int i = 0 ; i < 1000; i++)
> +          test1(M);
> +        return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" 
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c 
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> new file mode 100644
> index 00000000000..a88bc1f8663
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> @@ -0,0 +1,41 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks 
> -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +int a[1000];
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> +  if (n <= 0 || n > 100000)
> +        return;
> +  for (int i = 0; i <= n; i++)
> +    {
> +      if (i < n)
> +       do_something ();
> +      if (a[i])
> +       do_something2();
> +    }
> +}
> +int
> +main(int, char **)
> +{
> +  for (int i = 0 ; i < 1000; i+=3)
> +    a[i]=1;
> +  for (int i = 0 ; i < 1000; i++)
> +    test1(M);
> +  return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" 
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 
> "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index f441f3fe1b5..70cd0aaefa7 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -355,7 +356,7 @@ connect_loops (class loop *loop1, class loop *loop2)
>        new_e->flags |= EDGE_TRUE_VALUE;
>      }
>
> -  new_e->probability = profile_probability::likely ();
> +  new_e->probability = profile_probability::very_likely ();
>    skip_e->probability = new_e->probability.invert ();
>
>    return new_e;
> @@ -496,6 +497,8 @@ fix_loop_bb_probability (class loop *loop1, class loop 
> *loop2, edge true_edge,
>    unsigned j;
>    for (j = 0; j < loop1->num_nodes; j++)
>      if (bbs1[j] == loop1->latch
> +       /* Watch for case where the true conditional is empty.  */
> +       || !single_pred_p (true_edge->dest)
>         || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
>        bbs1[j]->count
>         = bbs1[j]->count.apply_probability (true_edge->probability);
> @@ -507,6 +510,8 @@ fix_loop_bb_probability (class loop *loop1, class loop 
> *loop2, edge true_edge,
>    bbs2 = get_loop_body (loop2);
>    for (j = 0; j < loop2->num_nodes; j++)
>      if (bbs2[j] == loop2->latch
> +       /* Watch for case where the flase conditional is empty.  */
> +       || !single_pred_p (bbi_copy)
>         || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
>        bbs2[j]->count
>         = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
> @@ -564,6 +569,7 @@ split_loop (class loop *loop1)
>    for (i = 0; i < loop1->num_nodes; i++)
>      if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>        {
> +       profile_count entry_count = loop_preheader_edge (loop1)->count ();
>         /* Handling opposite steps is not implemented yet.  Neither
>            is handling different step sizes.  */
>         if ((tree_int_cst_sign_bit (iv.step)
> @@ -610,7 +616,8 @@ split_loop (class loop *loop1)
>         if (stmts2)
>           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>                                             stmts2);
> -       tree cond = build2 (guard_code, boolean_type_node, guard_init, 
> border);
> +       tree cond = fold_build2 (guard_code, boolean_type_node,
> +                                guard_init, border);
>         if (!initial_true)
>           cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>
> @@ -622,13 +629,71 @@ split_loop (class loop *loop1)
>         initialize_original_copy_tables ();
>         basic_block cond_bb;
>
> +       profile_probability loop1_prob
> +         = integer_onep (cond) ? profile_probability::always ()
> +                               : true_edge->probability;
> +       /* TODO: It is commonly a case that we know that both loops will be
> +          entered.  very_likely below is the probability that second loop 
> will
> +          be entered given by connect_loops.  We should work out the common
> +          case it is always true.  */
>         class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -                                         true_edge->probability,
> -                                         true_edge->probability.invert (),
> +                                         loop1_prob,
> +                                         /* Pass always as we will later
> +                                            redirect first loop to second
> +                                            loop.  */
>                                           profile_probability::always (),
>                                           profile_probability::always (),
> +                                         profile_probability::very_likely (),
>                                           true);
>         gcc_assert (loop2);
> +       /* Correct probability of edge  cond_bb->preheader_of_loop2.  */
> +       single_pred_edge
> +               (loop_preheader_edge (loop2)->src)->probability
> +                       = loop1_prob.invert ();
> +
> +       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> +
> +       /* Fix loop's exit probability after scaling.  */
> +
> +       if (entry_count.initialized_p () && entry_count.nonzero_p ())
> +         {
> +           edge exit_to_latch1;
> +           if (EDGE_SUCC (exit1->src, 0) == exit1)
> +             exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> +           else
> +             exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> +           if (exit1->src->count.nonzero_p ())
> +             {
> +               /* First loop is entered loop1_prob * entry_count times
> +                  and it needs to exit the same number of times.  */
> +               exit1->probability
> +                       = entry_count.apply_probability
> +                               (loop1_prob).probability_in 
> (exit1->src->count);
> +               exit_to_latch1->probability = exit1->probability.invert ();
> +               scale_dominated_blocks_in_loop (loop1, exit1->src,
> +                                               exit_to_latch1->count (),
> +                                               exit_to_latch1->dest->count);
> +             }
> +
> +           edge exit_to_latch2, exit2 = single_exit (loop2);
> +           if (EDGE_SUCC (exit2->src, 0) == exit2)
> +             exit_to_latch2 = EDGE_SUCC (exit2->src, 1);
> +           else
> +             exit_to_latch2 = EDGE_SUCC (exit2->src, 0);
> +           if (exit2->src->count.nonzero_p ())
> +             {
> +               /* Second loop is entered very_likely * entry_count times
> +                  and it needs to exit the same number of times.  */
> +               exit2->probability
> +                       = entry_count.apply_probability
> +                               (profile_probability::very_likely ())
> +                               .probability_in (exit2->src->count);
> +               exit_to_latch2->probability = exit2->probability.invert ();
> +               scale_dominated_blocks_in_loop (loop2, exit2->src,
> +                                               exit_to_latch2->count (),
> +                                               exit_to_latch2->dest->count);
> +             }
> +         }
>
>         edge new_e = connect_loops (loop1, loop2);
>         connect_loop_phis (loop1, loop2, new_e);
> @@ -647,16 +712,7 @@ split_loop (class loop *loop1)
>         tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge 
> (loop1));
>         patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>
> -       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> -
> -       /* Fix first loop's exit probability after scaling.  */
> -       edge exit_to_latch1;
> -       if (EDGE_SUCC (exit1->src, 0) == exit1)
> -         exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> -       else
> -         exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> -       exit_to_latch1->probability *= true_edge->probability;
> -       exit1->probability = exit_to_latch1->probability.invert ();
> +       /* TODO: Update any_esitmate and upper bounds.  */
>
>         /* Finally patch out the two copies of the condition to be always
>            true/false (or opposite).  */

Re: Loop-split improvements, part 2

Reply via email to