[PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

Alex Coplan Mon, 06 Jan 2025 03:35:21 -0800

As it stands, scale_loop_profile doesn't correctly handle loops with
multiple exits.  In particular, in the case where the expected niters
exceeds iteration_bound, scale_loop_profile attempts to reduce the
number of iterations with a call to scale_loop_frequencies, which
multiplies the count of each BB by a given probability.  This
transformation preserves the relationships between the counts of the BBs
within the loop (and thus the edge probabilities stay the same) but this
cannot possibly work for loops with multiple exits, since in order for
the expected niters to reduce (and counts along exit edges to remain the
same), the exit edge probabilities must increase, thus decreasing the
probabilities of the internal edges, meaning that the ratios of the
counts of the BBs inside the loop must change.  So we need a different
approach (not a straightforward multiplicative scaling) to adjust the
expected niters of a loop with multiple exits.


This patch introduces a new helper, flow_scale_loop_freqs, which can be
used to correctly scale the profile of a loop with multiple exits.  It
is parameterized by a probability (with which to scale the header and
therefore the expected niters) and a lambda which gives the desired
counts for the exit edges.  In this patch, to make things simpler,
flow_scale_loop_freqs only handles loop shapes without internal control
flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
whether a given loop meets these criteria.  This restriction is
reasonable since this patch is motivated by fixing the profile
consistency for early break vectorization, and we don't currently
vectorize loops with internal control flow.  We also fall back to a
multiplicative scaling (the status quo) for loops that
flow_scale_loop_freqs can't handle, so the patch should be a net
improvement.

We wrap the call to flow_scale_loop_freqs in a helper
scale_loop_freqs_with_exit_counts which handles the above-mentioned
fallback.  This wrapper is still generic in that it accepts a lambda to
allow overriding the desired exit edge counts.  We specialize this with
another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
counts along exit edges fixed), which is then used to implement the
niters-scaling case of scale_loop_profile, thus fixing this path through
the function for loops with multiple exits.

Finally, we expose two new wrapper functions in cfgloopmanip.h for use
in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
is a variant of scale_loop_profile which assumes we want to keep the
counts along exit edges of the loop fixed through both parts of the
transformation (including the initial probability scale).
scale_loop_freqs_with_new_exit_count is intended to be used in a
subsequent patch when adding a skip edge around the epilog, where the
reduction of count entering the loop is mirrored by a reduced count
along a given exit edge.

Bootstrapped/regtested as a series on aarch64-linux-gnu,
x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

        PR tree-optimization/117790
        * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
        (flow_scale_loop_freqs): New.
        (scale_loop_freqs_with_exit_counts): New.
        (scale_loop_freqs_hold_exit_counts): New.
        (scale_loop_profile): Refactor to use the newly-added
        scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
        correctly handle reducing the expected niters for loops with multiple
        exits.
        (scale_loop_freqs_with_new_exit_count): New.
        (scale_loop_profile_1): New.
        (scale_loop_profile_hold_exit_counts): New.
        * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
        (scale_loop_freqs_with_new_exit_count): New.
---
 gcc/cfgloopmanip.cc | 309 ++++++++++++++++++++++++++++++++++++++++----
 gcc/cfgloopmanip.h  |   7 +
 2 files changed, 294 insertions(+), 22 deletions(-)

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 534e556e1e4..1714f312d42 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -677,17 +677,243 @@ update_loop_exit_probability_scale_dom_bbs (class loop *loop,
   return exit_edge;
 }
 
-/* Scale profile in LOOP by P.
-   If ITERATION_BOUND is not -1, scale even further if loop is predicted
-   to iterate too many times.
-   Before caling this function, preheader block profile should be already
-   scaled to final count.  This is necessary because loop iterations are
-   determined by comparing header edge count to latch ege count and thus
-   they need to be scaled synchronously.  */
+/* Check if LOOP is a simple loop for scaling purposes; that is, it
+   has no internal control flow and at most one exit from each BB.
+
+   Also check if the sum of the new exit counts given by
+   GET_EXIT_COUNT will differ significantly from the count coming in to
+   the loop.  Return false if so, since flow_scale_loop_freqs relies on
+   loops having consistent profiles in this sense.  */
+template<typename ExitCountFn>
+static bool
+can_flow_scale_loop_freqs_p (class loop *loop,
+			     ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+
+  const profile_count count_in = loop_count_in (loop);
+  profile_count exit_count = profile_count::zero ();
+
+  while (bb != loop->latch)
+    {
+      /* Punt if any of the BB counts are uninitialized.  */
+      if (!bb->count.initialized_p ())
+	return false;
+
+      bool found_exit = false;
+      edge internal_edge = nullptr;
+      for (auto e : bb->succs)
+	if (flow_bb_inside_loop_p (loop, e->dest))
+	  {
+	    if (internal_edge)
+	      return false;
+	    internal_edge = e;
+	  }
+	else
+	  {
+	    if (found_exit)
+	      return false;
+	    found_exit = true;
+	    exit_count += get_exit_count (e);
+	  }
+
+      bb = internal_edge->dest;
+    }
+
+  /* Punt if any exit edge had an uninitialized count.  */
+  if (!exit_count.initialized_p ())
+    return false;
+
+  /* In a consistent profile we must have count in = count out,
+     return false if this property doesn't hold for this loop,
+     as we rely on it in flow_scale_loop_freqs.  */
+  if (exit_count.differs_from_p (count_in))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file,
+		   ";; loop %d has inconsistent profile (count in = ",
+		   loop->num);
+	  count_in.dump (dump_file);
+	  fprintf (dump_file, ", count out = ");
+	  exit_count.dump (dump_file);
+	  fprintf (dump_file, "), cannot use count-flowing adjustment\n");
+	}
+
+      return false;
+    }
+
+  return true;
+}
+
+/* This function assumes LOOP satisfies can_flow_scale_loop_freqs_p.
+   For such a loop, adjust its profile so that the header now executes
+   PROB*N times, and adjust the exit probabilities such that the new exit
+   counts are those given by GET_EXIT_COUNT (E) for each exit edge E, flowing
+   the resulting counts along internal edges to make the profile consistent.  */
+
+template<typename ExitCountFn>
+static void
+flow_scale_loop_freqs (class loop *loop,
+		       profile_probability prob,
+		       ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+  profile_count new_block_count = bb->count.apply_probability (prob);
+
+  while (1)
+    {
+      edge internal_edge = nullptr;
+      edge exit_edge = nullptr;
+      for (auto edge : bb->succs)
+	{
+	  if (flow_bb_inside_loop_p (loop, edge->dest))
+	    {
+	      gcc_assert (!internal_edge);
+	      internal_edge = edge;
+	    }
+	  else
+	    {
+	      gcc_assert (!exit_edge);
+	      exit_edge = edge;
+	    }
+	}
+
+      if (!exit_edge)
+	{
+	  /* If there is no exit edge, then there must be exactly one outgoing
+	     (loop-internal) edge for this bb.  No probabilities need
+	     updating.  */
+	  bb->count = new_block_count;
+	  if (bb == loop->latch)
+	    return;
+
+	  bb = internal_edge->dest;
+	  continue;
+	}
+
+      const profile_count new_exit_count = get_exit_count (exit_edge);
+      profile_probability new_exit_prob;
+      if (new_block_count.nonzero_p ())
+	new_exit_prob = new_exit_count.probability_in (new_block_count);
+      else
+	{
+	  /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile
+	     consistent is if NEW_EXIT_COUNT is zero too.  */
+	  if (dump_file && new_exit_count.nonzero_p ())
+	    fprintf (dump_file,
+		     ";; flow_scale_loop_freqs wants non-zero exit count "
+		     "but bb count is zero/uninit: profile is inconsistent\n");
+
+	  /* Arbitrarily set the exit probability to 0.  */
+	  new_exit_prob = profile_probability::never ();
+	}
+
+      set_edge_probability_and_rescale_others (exit_edge, new_exit_prob);
+      bb->count = new_block_count;
+      new_block_count = internal_edge->count ();
+      bb = internal_edge->dest;
+    }
+}
+
+/* Attempt to adjust the profile of LOOP such that the header executes P*N
+   times where N is the current (unadjusted) expected iteration count.  Also
+   adjust the exit probabilities such that each exit edge E has count
+   GET_NEW_EXIT_COUNT (E).
+
+   Where possible, this function will use flow_scale_loop_freqs in order to
+   handle loops with multiple exits correctly.  Otherwise, we fall back to a
+   simple multiplicative scaling of the BB freqs.  */
+
+template<typename ExitCountFn>
+static void
+scale_loop_freqs_with_exit_counts (class loop *loop,
+				   profile_probability p,
+				   ExitCountFn get_new_exit_count)
+{
+  if (can_flow_scale_loop_freqs_p (loop, get_new_exit_count))
+    {
+      flow_scale_loop_freqs (loop, p, get_new_exit_count);
+      return;
+    }
+
+  scale_loop_frequencies (loop, p);
+  if (auto scale_e = loop_exit_for_scaling (loop))
+    {
+      if (scale_e->src->count.nonzero_p ())
+	scale_e->probability
+	  = get_new_exit_count (scale_e).probability_in (scale_e->src->count);
+    }
+  else
+    {
+      /* Multiplicative scaling of the loop frequencies for a loop with
+	 multiple exits works if we are simply reducing the
+	 number of times the loop is entered, but not to reduce the
+	 iteration count of the loop.  */
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file,
+		   ";; Multiplicative scale by ");
+	  p.dump (dump_file);
+	  fprintf (dump_file,
+		   " for loop %d with multiple exits: "
+		   "profile may become inconsistent\n",
+		   loop->num);
+	}
+
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (auto edge : exits)
+	if (edge->src->count.nonzero_p ())
+	  edge->probability
+	    = get_new_exit_count (edge).probability_in (edge->src->count);
+    }
+}
+
+/* Adjust the profile of LOOP such that it is expected to iterate P*N times
+   where N is the (unadjusted) expected iteration count.  Adjust the exit
+   probabilities such that exit edge counts remain fixed before/after the
+   transformation.  */
+
+static void
+scale_loop_freqs_hold_exit_counts (class loop *loop,
+				   profile_probability p)
+{
+  auto id_counts = [](edge e) { return e->count (); };
+  scale_loop_freqs_with_exit_counts (loop, p, id_counts);
+}
+
+/* Adjust the profile of LOOP such that it is expected to iterate P*N times
+   where N is the (unadjusted) expected iteration count.  Adjust the exit
+   probabilities such that exit edge counts remain fixed before/after the
+   transformation, except for EXIT_E which is set to have count NEW_COUNT after
+   the transformation.  */
 
 void
-scale_loop_profile (class loop *loop, profile_probability p,
-		    gcov_type iteration_bound)
+scale_loop_freqs_with_new_exit_count (class loop *loop,
+				      profile_probability p,
+				      edge exit_e,
+				      profile_count new_count)
+{
+  auto get_new_counts = [=](edge e)
+    {
+      if (e == exit_e)
+	return new_count;
+      else
+	return e->count ();
+    };
+  scale_loop_freqs_with_exit_counts (loop, p, get_new_counts);
+}
+
+/* Outline of scale_loop_profile implementation.  Used by
+   scale_loop_profile{,_hold_exit_counts}.  */
+
+template<typename ScaleLoopFreqs, typename ScaleNiters>
+static void
+scale_loop_profile_1 (class loop *loop,
+		      profile_probability p,
+		      gcov_type iteration_bound,
+		      ScaleLoopFreqs freq_scaler,
+		      ScaleNiters niters_scaler)
 {
   if (!(p == profile_probability::always ()))
     {
@@ -700,7 +926,7 @@ scale_loop_profile (class loop *loop, profile_probability p,
 	}
 
       /* Scale the probabilities.  */
-      scale_loop_frequencies (loop, p);
+      freq_scaler (loop, p);
     }
 
   if (iteration_bound == -1)
@@ -737,18 +963,57 @@ scale_loop_profile (class loop *loop, profile_probability p,
       fprintf (dump_file, " to reach upper bound %i\n",
 	       (int)iteration_bound);
     }
-  /* Finally attempt to fix exit edge probability.  */
-  edge exit_edge = loop_exit_for_scaling (loop);
-
-  /* In a consistent profile unadjusted_exit_count should be same as count_in,
-     however to preserve as much of the original info, avoid recomputing
-     it.  */
-  profile_count unadjusted_exit_count = profile_count::uninitialized ();
-  if (exit_edge)
-    unadjusted_exit_count = exit_edge->count ();
-  scale_loop_frequencies (loop, scale_prob);
-  update_loop_exit_probability_scale_dom_bbs (loop, exit_edge,
-					      unadjusted_exit_count);
+
+  niters_scaler (loop, scale_prob);
+}
+
+/* Scale profile in LOOP by P.
+   If ITERATION_BOUND is not -1, scale even further if loop is predicted
+   to iterate too many times.
+   Before caling this function, preheader block profile should be already
+   scaled to final count.  This is necessary because loop iterations are
+   determined by comparing header edge count to latch ege count and thus
+   they need to be scaled synchronously.  */
+
+void
+scale_loop_profile (class loop *loop, profile_probability p,
+		    gcov_type iteration_bound)
+{
+  auto scale_niters = [](class loop *loop, profile_probability scale_prob)
+    {
+      edge exit_edge = loop_exit_for_scaling (loop);
+
+      profile_count unadjusted_exit_count = profile_count::uninitialized ();
+      if (exit_edge)
+	unadjusted_exit_count = exit_edge->count ();
+      else
+	{
+	  scale_loop_freqs_hold_exit_counts (loop, scale_prob);
+	  return;
+	}
+
+      scale_loop_frequencies (loop, scale_prob);
+      update_loop_exit_probability_scale_dom_bbs (loop, exit_edge,
+						  unadjusted_exit_count);
+    };
+
+  scale_loop_profile_1 (loop, p, iteration_bound,
+			&scale_loop_frequencies,
+			scale_niters);
+}
+
+/* This function is like scale_loop_profile but additionally takes care of
+   updating the exit edge probabilities in order to hold the exit edge counts
+   constant before/after any transformation.  */
+
+void
+scale_loop_profile_hold_exit_counts (class loop *loop,
+				     profile_probability p,
+				     gcov_type iteration_bound)
+{
+  scale_loop_profile_1 (loop, p, iteration_bound,
+			&scale_loop_freqs_hold_exit_counts,
+			&scale_loop_freqs_hold_exit_counts);
 }
 
 /* Recompute dominance information for basic blocks outside LOOP.  */
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 42def2fe40d..faa874c56d5 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -41,6 +41,13 @@ extern void place_new_loop (struct function *, class loop *);
 extern void add_loop (class loop *, class loop *);
 extern void scale_loop_frequencies (class loop *, profile_probability);
 extern void scale_loop_profile (class loop *, profile_probability, gcov_type);
+extern void scale_loop_profile_hold_exit_counts (class loop *,
+						 profile_probability,
+						 gcov_type);
+extern void scale_loop_freqs_with_new_exit_count (class loop *,
+						  profile_probability,
+						  edge, profile_count);
+
 extern edge create_empty_if_region_on_edge (edge, tree);
 extern class loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree,
 					       tree *, tree *, class loop *);

[PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

Reply via email to