As it stands, scale_loop_profile doesn't correctly handle loops with multiple exits. In particular, in the case where the expected niters exceeds iteration_bound, scale_loop_profile attempts to reduce the number of iterations with a call to scale_loop_frequencies, which multiplies the count of each BB by a given probability. This transformation preserves the relationships between the counts of the BBs within the loop (and thus the edge probabilities stay the same) but this cannot possibly work for loops with multiple exits, since in order for the expected niters to reduce (and counts along exit edges to remain the same), the exit edge probabilities must increase, thus decreasing the probabilities of the internal edges, meaning that the ratios of the counts of the BBs inside the loop must change. So we need a different approach (not a straightforward multiplicative scaling) to adjust the expected niters of a loop with multiple exits.
This patch introduces a new helper, flow_scale_loop_freqs, which can be used to correctly scale the profile of a loop with multiple exits. It is parameterized by a probability (with which to scale the header and therefore the expected niters) and a lambda which gives the desired counts for the exit edges. In this patch, to make things simpler, flow_scale_loop_freqs only handles loop shapes without internal control flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test whether a given loop meets these criteria. This restriction is reasonable since this patch is motivated by fixing the profile consistency for early break vectorization, and we don't currently vectorize loops with internal control flow. We also fall back to a multiplicative scaling (the status quo) for loops that flow_scale_loop_freqs can't handle, so the patch should be a net improvement. We wrap the call to flow_scale_loop_freqs in a helper scale_loop_freqs_with_exit_counts which handles the above-mentioned fallback. This wrapper is still generic in that it accepts a lambda to allow overriding the desired exit edge counts. We specialize this with another wrapper, scale_loop_freqs_hold_exit_counts (keeping the counts along exit edges fixed), which is then used to implement the niters-scaling case of scale_loop_profile, thus fixing this path through the function for loops with multiple exits. Finally, we expose two new wrapper functions in cfgloopmanip.h for use in subsequent vectorizer patches. scale_loop_profile_hold_exit_counts is a variant of scale_loop_profile which assumes we want to keep the counts along exit edges of the loop fixed through both parts of the transformation (including the initial probability scale). scale_loop_freqs_with_new_exit_count is intended to be used in a subsequent patch when adding a skip edge around the epilog, where the reduction of count entering the loop is mirrored by a reduced count along a given exit edge. Bootstrapped/regtested as a series on aarch64-linux-gnu, x86_64-linux-gnu, and arm-linux-gnueabihf. OK for trunk? Thanks, Alex gcc/ChangeLog: PR tree-optimization/117790 * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New. (flow_scale_loop_freqs): New. (scale_loop_freqs_with_exit_counts): New. (scale_loop_freqs_hold_exit_counts): New. (scale_loop_profile): Refactor to use the newly-added scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to correctly handle reducing the expected niters for loops with multiple exits. (scale_loop_freqs_with_new_exit_count): New. (scale_loop_profile_1): New. (scale_loop_profile_hold_exit_counts): New. * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New. (scale_loop_freqs_with_new_exit_count): New. --- gcc/cfgloopmanip.cc | 309 ++++++++++++++++++++++++++++++++++++++++---- gcc/cfgloopmanip.h | 7 + 2 files changed, 294 insertions(+), 22 deletions(-)
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 534e556e1e4..1714f312d42 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -677,17 +677,243 @@ update_loop_exit_probability_scale_dom_bbs (class loop *loop, return exit_edge; } -/* Scale profile in LOOP by P. - If ITERATION_BOUND is not -1, scale even further if loop is predicted - to iterate too many times. - Before caling this function, preheader block profile should be already - scaled to final count. This is necessary because loop iterations are - determined by comparing header edge count to latch ege count and thus - they need to be scaled synchronously. */ +/* Check if LOOP is a simple loop for scaling purposes; that is, it + has no internal control flow and at most one exit from each BB. + + Also check if the sum of the new exit counts given by + GET_EXIT_COUNT will differ significantly from the count coming in to + the loop. Return false if so, since flow_scale_loop_freqs relies on + loops having consistent profiles in this sense. */ +template<typename ExitCountFn> +static bool +can_flow_scale_loop_freqs_p (class loop *loop, + ExitCountFn get_exit_count) +{ + basic_block bb = loop->header; + + const profile_count count_in = loop_count_in (loop); + profile_count exit_count = profile_count::zero (); + + while (bb != loop->latch) + { + /* Punt if any of the BB counts are uninitialized. */ + if (!bb->count.initialized_p ()) + return false; + + bool found_exit = false; + edge internal_edge = nullptr; + for (auto e : bb->succs) + if (flow_bb_inside_loop_p (loop, e->dest)) + { + if (internal_edge) + return false; + internal_edge = e; + } + else + { + if (found_exit) + return false; + found_exit = true; + exit_count += get_exit_count (e); + } + + bb = internal_edge->dest; + } + + /* Punt if any exit edge had an uninitialized count. */ + if (!exit_count.initialized_p ()) + return false; + + /* In a consistent profile we must have count in = count out, + return false if this property doesn't hold for this loop, + as we rely on it in flow_scale_loop_freqs. */ + if (exit_count.differs_from_p (count_in)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + ";; loop %d has inconsistent profile (count in = ", + loop->num); + count_in.dump (dump_file); + fprintf (dump_file, ", count out = "); + exit_count.dump (dump_file); + fprintf (dump_file, "), cannot use count-flowing adjustment\n"); + } + + return false; + } + + return true; +} + +/* This function assumes LOOP satisfies can_flow_scale_loop_freqs_p. + For such a loop, adjust its profile so that the header now executes + PROB*N times, and adjust the exit probabilities such that the new exit + counts are those given by GET_EXIT_COUNT (E) for each exit edge E, flowing + the resulting counts along internal edges to make the profile consistent. */ + +template<typename ExitCountFn> +static void +flow_scale_loop_freqs (class loop *loop, + profile_probability prob, + ExitCountFn get_exit_count) +{ + basic_block bb = loop->header; + profile_count new_block_count = bb->count.apply_probability (prob); + + while (1) + { + edge internal_edge = nullptr; + edge exit_edge = nullptr; + for (auto edge : bb->succs) + { + if (flow_bb_inside_loop_p (loop, edge->dest)) + { + gcc_assert (!internal_edge); + internal_edge = edge; + } + else + { + gcc_assert (!exit_edge); + exit_edge = edge; + } + } + + if (!exit_edge) + { + /* If there is no exit edge, then there must be exactly one outgoing + (loop-internal) edge for this bb. No probabilities need + updating. */ + bb->count = new_block_count; + if (bb == loop->latch) + return; + + bb = internal_edge->dest; + continue; + } + + const profile_count new_exit_count = get_exit_count (exit_edge); + profile_probability new_exit_prob; + if (new_block_count.nonzero_p ()) + new_exit_prob = new_exit_count.probability_in (new_block_count); + else + { + /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile + consistent is if NEW_EXIT_COUNT is zero too. */ + if (dump_file && new_exit_count.nonzero_p ()) + fprintf (dump_file, + ";; flow_scale_loop_freqs wants non-zero exit count " + "but bb count is zero/uninit: profile is inconsistent\n"); + + /* Arbitrarily set the exit probability to 0. */ + new_exit_prob = profile_probability::never (); + } + + set_edge_probability_and_rescale_others (exit_edge, new_exit_prob); + bb->count = new_block_count; + new_block_count = internal_edge->count (); + bb = internal_edge->dest; + } +} + +/* Attempt to adjust the profile of LOOP such that the header executes P*N + times where N is the current (unadjusted) expected iteration count. Also + adjust the exit probabilities such that each exit edge E has count + GET_NEW_EXIT_COUNT (E). + + Where possible, this function will use flow_scale_loop_freqs in order to + handle loops with multiple exits correctly. Otherwise, we fall back to a + simple multiplicative scaling of the BB freqs. */ + +template<typename ExitCountFn> +static void +scale_loop_freqs_with_exit_counts (class loop *loop, + profile_probability p, + ExitCountFn get_new_exit_count) +{ + if (can_flow_scale_loop_freqs_p (loop, get_new_exit_count)) + { + flow_scale_loop_freqs (loop, p, get_new_exit_count); + return; + } + + scale_loop_frequencies (loop, p); + if (auto scale_e = loop_exit_for_scaling (loop)) + { + if (scale_e->src->count.nonzero_p ()) + scale_e->probability + = get_new_exit_count (scale_e).probability_in (scale_e->src->count); + } + else + { + /* Multiplicative scaling of the loop frequencies for a loop with + multiple exits works if we are simply reducing the + number of times the loop is entered, but not to reduce the + iteration count of the loop. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + ";; Multiplicative scale by "); + p.dump (dump_file); + fprintf (dump_file, + " for loop %d with multiple exits: " + "profile may become inconsistent\n", + loop->num); + } + + auto_vec<edge> exits = get_loop_exit_edges (loop); + for (auto edge : exits) + if (edge->src->count.nonzero_p ()) + edge->probability + = get_new_exit_count (edge).probability_in (edge->src->count); + } +} + +/* Adjust the profile of LOOP such that it is expected to iterate P*N times + where N is the (unadjusted) expected iteration count. Adjust the exit + probabilities such that exit edge counts remain fixed before/after the + transformation. */ + +static void +scale_loop_freqs_hold_exit_counts (class loop *loop, + profile_probability p) +{ + auto id_counts = [](edge e) { return e->count (); }; + scale_loop_freqs_with_exit_counts (loop, p, id_counts); +} + +/* Adjust the profile of LOOP such that it is expected to iterate P*N times + where N is the (unadjusted) expected iteration count. Adjust the exit + probabilities such that exit edge counts remain fixed before/after the + transformation, except for EXIT_E which is set to have count NEW_COUNT after + the transformation. */ void -scale_loop_profile (class loop *loop, profile_probability p, - gcov_type iteration_bound) +scale_loop_freqs_with_new_exit_count (class loop *loop, + profile_probability p, + edge exit_e, + profile_count new_count) +{ + auto get_new_counts = [=](edge e) + { + if (e == exit_e) + return new_count; + else + return e->count (); + }; + scale_loop_freqs_with_exit_counts (loop, p, get_new_counts); +} + +/* Outline of scale_loop_profile implementation. Used by + scale_loop_profile{,_hold_exit_counts}. */ + +template<typename ScaleLoopFreqs, typename ScaleNiters> +static void +scale_loop_profile_1 (class loop *loop, + profile_probability p, + gcov_type iteration_bound, + ScaleLoopFreqs freq_scaler, + ScaleNiters niters_scaler) { if (!(p == profile_probability::always ())) { @@ -700,7 +926,7 @@ scale_loop_profile (class loop *loop, profile_probability p, } /* Scale the probabilities. */ - scale_loop_frequencies (loop, p); + freq_scaler (loop, p); } if (iteration_bound == -1) @@ -737,18 +963,57 @@ scale_loop_profile (class loop *loop, profile_probability p, fprintf (dump_file, " to reach upper bound %i\n", (int)iteration_bound); } - /* Finally attempt to fix exit edge probability. */ - edge exit_edge = loop_exit_for_scaling (loop); - - /* In a consistent profile unadjusted_exit_count should be same as count_in, - however to preserve as much of the original info, avoid recomputing - it. */ - profile_count unadjusted_exit_count = profile_count::uninitialized (); - if (exit_edge) - unadjusted_exit_count = exit_edge->count (); - scale_loop_frequencies (loop, scale_prob); - update_loop_exit_probability_scale_dom_bbs (loop, exit_edge, - unadjusted_exit_count); + + niters_scaler (loop, scale_prob); +} + +/* Scale profile in LOOP by P. + If ITERATION_BOUND is not -1, scale even further if loop is predicted + to iterate too many times. + Before caling this function, preheader block profile should be already + scaled to final count. This is necessary because loop iterations are + determined by comparing header edge count to latch ege count and thus + they need to be scaled synchronously. */ + +void +scale_loop_profile (class loop *loop, profile_probability p, + gcov_type iteration_bound) +{ + auto scale_niters = [](class loop *loop, profile_probability scale_prob) + { + edge exit_edge = loop_exit_for_scaling (loop); + + profile_count unadjusted_exit_count = profile_count::uninitialized (); + if (exit_edge) + unadjusted_exit_count = exit_edge->count (); + else + { + scale_loop_freqs_hold_exit_counts (loop, scale_prob); + return; + } + + scale_loop_frequencies (loop, scale_prob); + update_loop_exit_probability_scale_dom_bbs (loop, exit_edge, + unadjusted_exit_count); + }; + + scale_loop_profile_1 (loop, p, iteration_bound, + &scale_loop_frequencies, + scale_niters); +} + +/* This function is like scale_loop_profile but additionally takes care of + updating the exit edge probabilities in order to hold the exit edge counts + constant before/after any transformation. */ + +void +scale_loop_profile_hold_exit_counts (class loop *loop, + profile_probability p, + gcov_type iteration_bound) +{ + scale_loop_profile_1 (loop, p, iteration_bound, + &scale_loop_freqs_hold_exit_counts, + &scale_loop_freqs_hold_exit_counts); } /* Recompute dominance information for basic blocks outside LOOP. */ diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h index 42def2fe40d..faa874c56d5 100644 --- a/gcc/cfgloopmanip.h +++ b/gcc/cfgloopmanip.h @@ -41,6 +41,13 @@ extern void place_new_loop (struct function *, class loop *); extern void add_loop (class loop *, class loop *); extern void scale_loop_frequencies (class loop *, profile_probability); extern void scale_loop_profile (class loop *, profile_probability, gcov_type); +extern void scale_loop_profile_hold_exit_counts (class loop *, + profile_probability, + gcov_type); +extern void scale_loop_freqs_with_new_exit_count (class loop *, + profile_probability, + edge, profile_count); + extern edge create_empty_if_region_on_edge (edge, tree); extern class loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree, tree *, tree *, class loop *);