On Tue, Oct 5, 2021 at 3:39 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> In g:62acc72a957b5614 I'd stopped the unroller from using
> an epilogue loop in cases where the iteration count was
> known to be a multiple of the unroll factor.  The epilogue
> and non-epilogue cases still shared this (preexisting) code
> to update the edge frequencies:
>
>   basic_block exit_bb = single_pred (loop->latch);
>   new_exit = find_edge (exit_bb, rest);
>   new_exit->probability = profile_probability::always ()
>                                .apply_scale (1, new_est_niter + 1);
>   [etc]
>
> But of course (in hindsight) that only makes sense for the
> epilogue case, where we've already moved the main loop's exit edge
> to be a sibling of the latch edge.  For the non-epilogue case,
> the exit edge stays (and needs to stay) in its original position.
>
> I don't really understand what the code is trying to do for
> the epilogue case.  It has:
>
>       /* Ensure that the frequencies in the loop match the new estimated
>          number of iterations, and change the probability of the new
>          exit edge.  */
>
>       profile_count freq_h = loop->header->count;
>       profile_count freq_e = (loop_preheader_edge (loop))->count ();
>       if (freq_h.nonzero_p ())
>         {
>           ...
>           scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>         }
>
> Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
> header block, this has the effect of:
>
>   new header count = freq_h * (freq_e / freq_h)
>
> i.e. we say that the header executes exactly as often as the
> preheader edge, which would only make sense if the loop never
> iterates.  Also, after setting the probability of the nonexit edge
> (correctly) to new_est_niter / (new_est_niter + 1), the code does:
>
>     scale_bbs_frequencies (&loop->latch, 1, prob);
>
> for this new probability.  I think that only makes sense if the
> nonexit edge was previously unconditional (100%).  But the code
> carefully preserved the probability of the original exit edge
> when creating the new one.
>
> All I'm trying to do here though is fix the mess I created
> and get the probabilities right for the non-epilogue case.
> Things are simpler there since we don't have to worry about
> loop versioning.  Hopefully the comments explain the approach.
>
> The function's current interface implies that it can cope with
> multiple exit edges and that the function only needs the iteration
> count relative to one of those edges in order to work correctly.
> In practice that's not the case: it assumes there is exactly one
> exit edge and all current callers also ensure that the exit test
> dominates the latch.  I think the function is easier to follow
> if we remove the implied generality.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         PR tree-optimization/102385
>         * predict.h (change_edge_frequency): Declare.
>         * predict.c (change_edge_frequency): New function.
>         * tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
>         edge argument.
>         (tree_unroll_loop): Likewise.
>         * gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
>         * tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
>         Likewise.
>         * tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
>         (tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
>         to retrieve the exit edges.  Make all the old profile update code
>         conditional on !single_loop_p -- the case it was written for --
>         and use a different approach for the single-loop case.
>
> gcc/testsuite/
>         * testsuite/gcc.dg/pr102385.c: New test.
> ---
>  gcc/gimple-loop-jam.c           |   3 +-
>  gcc/predict.c                   |  37 +++++++++++
>  gcc/predict.h                   |   1 +
>  gcc/testsuite/gcc.dg/pr102385.c |  14 ++++
>  gcc/tree-predcom.c              |   3 +-
>  gcc/tree-ssa-loop-manip.c       | 111 ++++++++++++++++++++++++--------
>  gcc/tree-ssa-loop-manip.h       |   5 +-
>  gcc/tree-ssa-loop-prefetch.c    |   3 +-
>  8 files changed, 140 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102385.c
>
> diff --git a/gcc/predict.h b/gcc/predict.h
> index 8860cafa31c..4df51bd615c 100644
> --- a/gcc/predict.h
> +++ b/gcc/predict.h
> @@ -100,6 +100,7 @@ extern void rebuild_frequencies (void);
>  extern void report_predictor_hitrates (void);
>  extern void force_edge_cold (edge, bool);
>  extern void propagate_unlikely_bbs_forward (void);
> +extern void change_edge_frequency (edge, profile_probability);
>
>  extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index d9c7249831e..68b11135680 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -4481,6 +4481,43 @@ force_edge_cold (edge e, bool impossible)
>      }
>  }
>
> +/* Change E's probability to NEW_E_PROB, redistributing the probabilities
> +   of other outgoing edges proportionally.
> +
> +   Note that this function does not change the profile counts of any
> +   basic blocks.  The caller must do that instead, using whatever
> +   information it has about the region that needs updating.  */
> +
> +void
> +change_edge_frequency (edge e, profile_probability new_e_prob)
> +{
> +  profile_probability old_e_prob = e->probability;
> +  profile_probability old_other_prob = old_e_prob.invert ();
> +  profile_probability new_other_prob = new_e_prob.invert ();
> +
> +  e->probability = new_e_prob;
> +  profile_probability cumulative_prob = new_e_prob;
> +
> +  unsigned int num_other = EDGE_COUNT (e->src->succs) - 1;
> +  edge other_e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (other_e, ei, e->src->succs)
> +    if (other_e != e)
> +      {
> +       num_other -= 1;
> +       if (num_other == 0)
> +         /* Ensure that the probabilities add up to 1 without
> +            rounding error.  */
> +         other_e->probability = cumulative_prob.invert ();
> +       else
> +         {
> +           other_e->probability /= old_other_prob;
> +           other_e->probability *= new_other_prob;
> +           cumulative_prob += other_e->probability;
> +         }
> +      }
> +}
> +
>  #if CHECKING_P
>
>  namespace selftest {
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index 86fc118b6be..5e2d7fa11a4 100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -50,10 +50,9 @@ extern bool can_unroll_loop_p (class loop *loop, unsigned 
> factor,
>                                class tree_niter_desc *niter);
>  extern gcov_type niter_for_unrolled_loop (class loop *, unsigned);
>  extern void tree_transform_and_unroll_loop (class loop *, unsigned,
> -                                           edge, class tree_niter_desc *,
> +                                           tree_niter_desc *,
>                                             transform_callback, void *);
> -extern void tree_unroll_loop (class loop *, unsigned,
> -                             edge, class tree_niter_desc *);
> +extern void tree_unroll_loop (class loop *, unsigned, tree_niter_desc *);
>  extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
>  extern unsigned int loop_invariant_motion_in_fun (function *, bool);
>
> diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
> index 6b195d1914f..8f1cf4d44a2 100644
> --- a/gcc/tree-predcom.c
> +++ b/gcc/tree-predcom.c
> @@ -3397,8 +3397,7 @@ pcom_worker::tree_predictive_commoning_loop (bool 
> allow_unroll_p)
>          the phi nodes in execute_pred_commoning_cbck.  A bit hacky.  */
>        replace_phis_by_defined_names (m_chains);
>
> -      edge exit = single_dom_exit (m_loop);
> -      tree_transform_and_unroll_loop (m_loop, unroll_factor, exit, &desc,
> +      tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
>                                       execute_pred_commoning_cbck, &dta);
>        eliminate_temp_copies (m_loop, tmp_vars);
>      }
> diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
> index d212e391489..611d3805304 100644
> --- a/gcc/gimple-loop-jam.c
> +++ b/gcc/gimple-loop-jam.c
> @@ -587,8 +587,7 @@ tree_loop_unroll_and_jam (void)
>                              "applying unroll and jam with factor %d\n",
>                              unroll_factor);
>           initialize_original_copy_tables ();
> -         tree_unroll_loop (outer, unroll_factor, single_dom_exit (outer),
> -                           &desc);
> +         tree_unroll_loop (outer, unroll_factor, &desc);
>           free_original_copy_tables ();
>           fuse_loops (outer->inner);
>           todo |= TODO_cleanup_cfg;
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 85977e23245..04b2b33c1ba 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1962,8 +1962,7 @@ loop_prefetch_arrays (class loop *loop)
>       iterations so that we do not issue superfluous prefetches.  */
>    if (unroll_factor != 1)
>      {
> -      tree_unroll_loop (loop, unroll_factor,
> -                       single_dom_exit (loop), &desc);
> +      tree_unroll_loop (loop, unroll_factor, &desc);
>        unrolled = true;
>      }
>
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index c7a2f67b129..2d576bfa391 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -1182,8 +1182,9 @@ niter_for_unrolled_loop (class loop *loop, unsigned 
> factor)
>    return new_est_niter;
>  }
>
> -/* Unroll LOOP FACTOR times.  DESC describes number of iterations of LOOP.
> -   EXIT is the exit of the loop to that DESC corresponds.
> +/* Unroll LOOP FACTOR times.  LOOP is known to have a single exit edge
> +   whose source block dominates the latch.  DESC describes the number of
> +   iterations of LOOP.
>
>     If N is number of iterations of the loop and MAY_BE_ZERO is the condition
>     under that loop exits in the first iteration even if N != 0,
> @@ -1241,7 +1242,7 @@ niter_for_unrolled_loop (class loop *loop, unsigned 
> factor)
>
>  void
>  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
> -                               edge exit, class tree_niter_desc *desc,
> +                               class tree_niter_desc *desc,
>                                 transform_callback transform,
>                                 void *data)
>  {
> @@ -1265,10 +1266,11 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>
>    gcond *exit_if = nullptr;
>    class loop *new_loop = nullptr;
> -  basic_block rest;
>    edge new_exit;
>    if (!single_loop_p)
>      {
> +      edge exit = single_dom_exit (loop);
> +
>        /* The values for scales should keep profile consistent, and somewhat
>          close to correct.
>
> @@ -1291,7 +1293,7 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>
>        /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
>          loop latch (and make its condition dummy, for the moment).  */
> -      rest = loop_preheader_edge (new_loop)->src;
> +      basic_block rest = loop_preheader_edge (new_loop)->src;
>        edge precond_edge = single_pred_edge (rest);
>        split_edge (loop_latch_edge (loop));
>        basic_block exit_bb = single_pred (loop->latch);
> @@ -1373,10 +1375,7 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>        remove_path (exit);
>      }
>    else
> -    {
> -      new_exit = exit;
> -      rest = exit->dest;
> -    }
> +    new_exit = single_dom_exit (loop);
>
>    /* Transform the loop.  */
>    if (transform)
> @@ -1401,6 +1400,7 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>      }
>    update_ssa (TODO_update_ssa);
>
> +  new_exit = single_dom_exit (loop);
>    if (!single_loop_p)
>      {
>        /* Ensure that the frequencies in the loop match the new estimated
> @@ -1417,29 +1417,24 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>             freq_e = freq_e.force_nonzero ();
>           scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>         }
> -    }
>
> -  basic_block exit_bb = single_pred (loop->latch);
> -  new_exit = find_edge (exit_bb, rest);
> -  new_exit->probability = profile_probability::always ()
> -                               .apply_scale (1, new_est_niter + 1);
> +      basic_block rest = new_exit->dest;
> +      new_exit->probability = profile_probability::always ()
> +       .apply_scale (1, new_est_niter + 1);
>
> -  if (!single_loop_p)
> -    rest->count += new_exit->count ();
> +      rest->count += new_exit->count ();
>
> -  edge new_nonexit = single_pred_edge (loop->latch);
> -  profile_probability prob = new_nonexit->probability;
> -  new_nonexit->probability = new_exit->probability.invert ();
> -  prob = new_nonexit->probability / prob;
> -  if (prob.initialized_p ())
> -    scale_bbs_frequencies (&loop->latch, 1, prob);
> +      edge new_nonexit = single_pred_edge (loop->latch);
> +      profile_probability prob = new_nonexit->probability;
> +      new_nonexit->probability = new_exit->probability.invert ();
> +      prob = new_nonexit->probability / prob;
> +      if (prob.initialized_p ())
> +       scale_bbs_frequencies (&loop->latch, 1, prob);
>
> -  if (!single_loop_p)
> -    {
>        /* Finally create the new counter for number of iterations and add
>          the new exit instruction.  */
>        tree ctr_before, ctr_after;
> -      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb);
> +      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (new_exit->src);
>        exit_if = as_a <gcond *> (gsi_stmt (bsi));
>        create_iv (exit_base, exit_step, NULL_TREE, loop,
>                  &bsi, false, &ctr_before, &ctr_after);
> @@ -1448,6 +1443,67 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>        gimple_cond_set_rhs (exit_if, exit_bound);
>        update_stmt (exit_if);
>      }
> +  else
> +    {
> +      /* gimple_duplicate_loop_to_header_edge has adjusted the loop body's
> +        original profile counts in line with the unroll factor.  However,
> +        the old counts might not have been consistent with the old
> +        iteration count.
> +
> +        Therefore, if the iteration count is known exactly, make sure that 
> the
> +        profile counts of the loop header (and any other blocks that might be
> +        executed in the final iteration) are consistent with the combination
> +        of (a) the incoming profile count and (b) the new iteration count.  
> */
> +      profile_count in_count = loop_preheader_edge (loop)->count ();
> +      profile_count old_header_count = loop->header->count;
> +      if (in_count.nonzero_p ()
> +         && old_header_count.nonzero_p ()
> +         && TREE_CODE (desc->niter) == INTEGER_CST)
> +       {
> +         /* The + 1 converts latch counts to iteration counts.  */
> +         profile_count new_header_count
> +           = (in_count.apply_scale (new_est_niter + 1, 1));
> +         basic_block *body = get_loop_body (loop);
> +         scale_bbs_frequencies_profile_count (body, loop->num_nodes,
> +                                              new_header_count,
> +                                              old_header_count);
> +         free (body);
> +       }
> +
> +      /* gimple_duplicate_loop_to_header_edge discarded FACTOR - 1
> +        exit edges and adjusted the loop body's profile counts for the
> +        new probabilities of the remaining non-exit edges.  However,
> +        the remaining exit edge still has the same probability as it
> +        did before, even though it is now more likely.
> +
> +        Therefore, all blocks executed after a failed exit test now have
> +        a profile count that is too high, and the sum of the profile counts
> +        for the header's incoming edges is greater than the profile count
> +        of the header itself.
> +
> +        Adjust the profile counts of all code in the loop body after
> +        the exit test so that the sum of the counts on entry to the
> +        header agree.  */
> +      profile_count old_latch_count = loop_latch_edge (loop)->count ();
> +      profile_count new_latch_count = loop->header->count - in_count;
> +      if (old_latch_count.nonzero_p () && new_latch_count.nonzero_p ())
> +       scale_dominated_blocks_in_loop (loop, new_exit->src, new_latch_count,
> +                                       old_latch_count);
> +
> +      /* Set the probability of the exit edge based on NEW_EST_NITER
> +        (which estimates latch counts rather than iteration counts).
> +        Update the probabilities of other edges to match.
> +
> +        If the profile counts are large enough to give the required
> +        precision, the updates above will have made
> +
> +           e->dest->count / e->src->count ~= new e->probability
> +
> +        for every outgoing edge e of NEW_EXIT->src.  */
> +      profile_probability new_exit_prob = profile_probability::always ()
> +       .apply_scale (1, new_est_niter + 1);
> +      change_edge_frequency (new_exit, new_exit_prob);
> +    }
>
>    checking_verify_flow_info ();
>    checking_verify_loop_structure ();
> @@ -1461,10 +1517,9 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>
>  void
>  tree_unroll_loop (class loop *loop, unsigned factor,
> -                 edge exit, class tree_niter_desc *desc)
> +                 class tree_niter_desc *desc)
>  {
> -  tree_transform_and_unroll_loop (loop, factor, exit, desc,
> -                                 NULL, NULL);
> +  tree_transform_and_unroll_loop (loop, factor, desc, NULL, NULL);
>  }
>
>  /* Rewrite the phi node at position PSI in function of the main
> diff --git a/gcc/testsuite/gcc.dg/pr102385.c b/gcc/testsuite/gcc.dg/pr102385.c
> new file mode 100644
> index 00000000000..1339540c722
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102385.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-Wall -Wextra -O2 -fno-toplevel-reorder -fno-tree-ch 
> -fno-tree-dce -fno-tree-dominator-opts -fno-tree-dse -fno-tree-loop-ivcanon 
> -fpredictive-commoning" } */
> +
> +short a, b;
> +int c[9];
> +void(d)() {}
> +void e() {
> +  a = 0;
> +  for (; a <= 4; a++) {
> +    short *f = &b;
> +    c[a] || (*f = 0);
> +    d(c[a + 2]);
> +  }
> +}
> +int main() {return 0;}

Reply via email to