On Wed, Aug 28, 2013 at 11:20 AM, Teresa Johnson <tejohn...@google.com> wrote: > On Wed, Aug 28, 2013 at 9:58 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> Hi, >> with Martin we did bit of progress on analyzing the problems. We now have >> COMDAT profile merging for FDO > > Great! Is this the LTO merging you were talking about in an earlier > message, or the gcov runtime fix (that would presumably not be > lto-specific)? > >> and we also noticed that forks can make your >> basic blocks appear never executed even though they are executed every run: >> the fork is accounted as 3 independnet runs of the program. First run >> is until fork, the second 2 runs are both variant. >> >> I have patch to track this. Moreover vforks seems to produce repeated >> merging of results. > > Aha, does this explain the gimp issue as well? > >> >> These two factors seems to help to riddle enought the firefox profiles >> so it took us weeks to understand what happens. >>> + if (changed) >>> + { >>> + /* Edge forwarding in particular can cause hot blocks >>> previously >>> + reached by both hot and cold blocks to become dominated >>> only >>> + by cold blocks. This will cause the verification >>> below to fail, >>> + and lead to now cold code in the hot section. This is not >>> easy >>> + to detect and fix during edge forwarding, and in some >>> cases >>> + is only visible after newly unreachable blocks are >>> deleted, >>> + which will be done in fixup_partitions. */ >>> + fixup_partitions (); >> >> Is it really necessary to run this from internal loop of the cfgcleanup? > > The reason I added it here is that just below there is a call to > verify_flow_info, and that will fail with the new verification. > >> It seems >> you will play back and forth game where edge forwarding will remove your >> fallthru >> and you will be re-adding it? > > fixup_partitions will not add new fall through edges. (It may invoke > force_nonfallthru to do the opposite.) So there shouldn't be any > ping-ponging effect. > >> >> I would wait for cfgcleanup to finish its job (I don't really think it needs >> to be >> iterative) and then do the fixup possibly cleaning up when the blocks was >> repoisitoined >> (I suppose it is only case where the code above introduces new cfgcleanup >> oppurtunities). > > As noted above, I can't do this due to the call to verify_flow_info > for each iteration. One option is to move both down outside the loop. > >> >>> + /* Walk the preds/succs and check if there is at least one already >>> + marked hot. Keep track of the most frequent pred/succ so that we >>> + can mark it hot if we don't find one. */ >>> + FOR_EACH_EDGE (e, ei, edges) >>> + { >>> + basic_block reach_bb = walk_up ? e->src : e->dest; >>> + >>> + if (e->flags & EDGE_DFS_BACK) >>> + continue; >>> + >>> + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) >>> + { >>> + found = true; >>> + break; >>> + } >>> + if (e->probability > highest_probability) >>> + highest_probability = e->probability; >> >> When doing predecestor walk if you have two predecestors, one executing >> 100000 >> times and getting to the block with probability 1%, you want to chose it over >> block executing once and getting to you with probability 100%. >> >> You probably want to look for most likely predecestor here. You need to look >> for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for >> maximal probability only if all fails? > > Yes, thanks, let me do that.
New patch that addresses this is included below. Thanks, Teresa > >> >>> + } >>> + >>> + /* If bb is reached by (or reaches, in the case of !WALK_UP) another >>> hot >>> + block (or unpartitioned, e.g. the entry block) then it is ok. If >>> not, >>> + then the most frequent pred (or succ) needs to be adjusted. In >>> the >>> + case where multiple preds/succs have the same probability (e.g. a >>> + 50-50 branch), then both will be adjusted. */ >>> + if (found) >>> + continue; >>> + >>> + FOR_EACH_EDGE (e, ei, edges) >>> + { >>> + if (e->flags & EDGE_DFS_BACK) >>> + continue; >>> + if (e->probability < highest_probability) >>> + continue; >> >> Again for predecestor walk you need to wind down the bit crazy logic >> described above. >>> Index: predict.c >>> =================================================================== >>> --- predict.c (revision 202021) >>> +++ predict.c (working copy) >>> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun >>> return false; >>> } >>> >>> + >>> +/* Return true in case edge E is probably never executed. */ >>> + >>> +bool >>> +probably_never_executed_edge_p (struct function *fun, edge e) >>> +{ >>> + gcc_checking_assert (fun); >>> + if (profile_info && flag_branch_probabilities) >>> + return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; >>> + if ((!profile_info || !flag_branch_probabilities) >>> + && (cgraph_get_node (fun->decl)->frequency >>> + == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >>> + return true; >>> + return false; >> Instead of duplicating the conditional, break out the tests into >> probably_never_executed_count_p, like we have for maybe_hot_count_p. > > ok > >> >> It would be nice to extend this to work w/o profile: >> probably_never_executed_edge_p >> can return true for EH edges, setjmp edges and we can then walk bodies >> ignoring >> EH/setjmp flagging blocks that are probably never executed enabling the >> partitioning >> to do a job w/o profile. > > agreed. although I would prefer to leave that for a follow-on patch so > it can be tuned a bit. > >> >> Otherwise the patch look OK to me. Thanks for working on this. Do we have >> agreement on C++ way of mixing >> declarations and code? > > According to http://gcc.gnu.org/wiki/CppConventions: > > "In new code variables which are used in a small scope should be > defined at the point of first use, rather than at the top of the > function. Variables which are used throughout the function may be > defined at the top of the function, as in C." > > I think I am following that, but let me know if you see something that > needs to be fixed. > > Thanks, > Teresa > >> 2013-08-29 Teresa Johnson <tejohn...@google.com> Steven Bosscher <ste...@gcc.gnu.org> * cfgrtl.c (fixup_new_cold_bb): New routine. (commit_edge_insertions): Invoke fixup_partitions. (find_partition_fixes): New routine. (fixup_partitions): Ditto. (verify_hot_cold_block_grouping): Update comments. (rtl_verify_edges): Invoke find_partition_fixes. (rtl_verify_bb_pointers): Update comments. (rtl_verify_bb_layout): Ditto. * basic-block.h (probably_never_executed_edge_p): Declare. (fixup_partitions): Ditto. * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. * bb-reorder.c (sanitize_hot_paths): New function. (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke sanitize_hot_paths. * predict.c (probably_never_executed_edge_p): New routine. * cfg.c (check_bb_profile): Add partition insanity warnings. Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 202021) +++ cfgrtl.c (working copy) @@ -1358,6 +1358,43 @@ fixup_partition_crossing (edge e) } } +/* Called when block BB has been reassigned to the cold partition, + because it is now dominated by another cold block, + to ensure that the region crossing attributes are updated. */ + +static void +fixup_new_cold_bb (basic_block bb) +{ + edge e; + edge_iterator ei; + + /* This is called when a hot bb is found to now be dominated + by a cold bb and therefore needs to become cold. Therefore, + its preds will no longer be region crossing. Any non-dominating + preds that were previously hot would also have become cold + in the caller for the same region. Any preds that were previously + region-crossing will be adjusted in fixup_partition_crossing. */ + FOR_EACH_EDGE (e, ei, bb->preds) + { + fixup_partition_crossing (e); + } + + /* Possibly need to make bb's successor edges region crossing, + or remove stale region crossing. */ + FOR_EACH_EDGE (e, ei, bb->succs) + { + /* We can't have fall-through edges across partition boundaries. + Note that force_nonfallthru will do any necessary partition + boundary fixup by calling fixup_partition_crossing itself. */ + if ((e->flags & EDGE_FALLTHRU) + && BB_PARTITION (bb) != BB_PARTITION (e->dest) + && e->dest != EXIT_BLOCK_PTR) + force_nonfallthru (e); + else + fixup_partition_crossing (e); + } +} + /* Attempt to change code to redirect edge E to TARGET. Don't do that on expense of adding new instructions or reordering basic blocks. @@ -1996,6 +2033,14 @@ commit_edge_insertions (void) { basic_block bb; + /* Optimization passes that invoke this routine can cause hot blocks + previously reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. In some cases this + may only be visible after newly unreachable blocks are deleted, + which will be done by fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING verify_flow_info (); #endif @@ -2190,6 +2235,101 @@ get_last_bb_insn (basic_block bb) return end; } +/* Sanity check partition hotness to ensure that basic blocks in + the cold partition don't dominate basic blocks in the hot partition. + If FLAG_ONLY is true, report violations as errors. Otherwise + re-mark the dominated blocks as cold, since this is run after + cfg optimizations that may make hot blocks previously reached + by both hot and cold blocks now only reachable along cold paths. */ + +static vec<basic_block> +find_partition_fixes (bool flag_only) +{ + basic_block bb; + vec<basic_block> bbs_in_cold_partition = vNULL; + vec<basic_block> bbs_to_fix = vNULL; + + /* Callers check this. */ + gcc_checking_assert (crtl->has_bb_partition); + + FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + bbs_in_cold_partition.safe_push (bb); + + if (bbs_in_cold_partition.is_empty ()) + return vNULL; + + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! bbs_in_cold_partition.is_empty ()) + { + bb = bbs_in_cold_partition.pop (); + /* Any blocks dominated by a block in the cold section + must also be cold. */ + basic_block son; + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + { + /* If son is not yet cold, then mark it cold here and + enqueue it for further processing. */ + if ((BB_PARTITION (son) != BB_COLD_PARTITION)) + { + if (flag_only) + error ("non-cold basic block %d dominated " + "by a block in the cold partition (%d)", son->index, bb->index); + else + BB_SET_PARTITION (son, BB_COLD_PARTITION); + bbs_to_fix.safe_push (son); + bbs_in_cold_partition.safe_push (son); + } + } + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + + return bbs_to_fix; +} + +/* Perform cleanup on the hot/cold bb partitioning after optimization + passes that modify the cfg. */ + +void +fixup_partitions (void) +{ + basic_block bb; + + if (!crtl->has_bb_partition) + return; + + /* Delete any blocks that became unreachable and weren't + already cleaned up, for example during edge forwarding + and convert_jumps_to_returns. This will expose more + opportunities for fixing the partition boundaries here. + Also, the calculation of the dominance graph during verification + will assert if there are unreachable nodes. */ + delete_unreachable_blocks (); + + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. + Fixup any that now violate this requirement, as a result of edge + forwarding and unreachable block deletion. */ + vec<basic_block> bbs_to_fix = find_partition_fixes (false); + + /* Do the partition fixup after all necessary blocks have been converted to + cold, so that we only update the region crossings the minimum number of + places, which can require forcing edges to be non fallthru. */ + while (! bbs_to_fix.is_empty ()) + { + bb = bbs_to_fix.pop (); + fixup_new_cold_bb (bb); + } +} + /* Verify, in the basic block chain, that there is at most one switch between hot/cold partitions. This condition will not be true until after reorder_basic_blocks is called. */ @@ -2236,7 +2376,8 @@ verify_hot_cold_block_grouping (void) /* Perform several checks on the edges out of each block, such as the consistency of the branch probabilities, the correctness of hot/cold partition crossing edges, and the number of expected - successor edges. */ + successor edges. Also verify that the dominance relationship + between hot/cold blocks is sane. */ static int rtl_verify_edges (void) @@ -2399,6 +2540,14 @@ rtl_verify_edges (void) } } + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. */ + if (crtl->has_bb_partition && !err) + { + vec<basic_block> bbs_to_fix = find_partition_fixes (true); + err = !bbs_to_fix.is_empty (); + } + /* Clean up. */ return err; } @@ -2532,7 +2681,7 @@ rtl_verify_bb_pointers (void) and NOTE_INSN_BASIC_BLOCK - verify that no fall_thru edge crosses hot/cold partition boundaries - verify that there are no pending RTL branch predictions - - verify that there is a single hot/cold partition boundary after bbro + - verify that hot blocks are not dominated by cold blocks In future it can be extended check a lot of other stuff as well (reachability of basic blocks, life information, etc. etc.). */ @@ -2778,7 +2927,8 @@ rtl_verify_bb_layout (void) - check that all insns are in the basic blocks (except the switch handling code, barriers and notes) - check that all returns are followed by barriers - - check that all fallthru edge points to the adjacent blocks. */ + - check that all fallthru edge points to the adjacent blocks + - verify that there is a single hot/cold partition boundary after bbro */ static int rtl_verify_flow_info (void) Index: basic-block.h =================================================================== --- basic-block.h (revision 202021) +++ basic-block.h (working copy) @@ -726,6 +726,7 @@ extern void compute_available (sbitmap *, sbitmap extern bool maybe_hot_bb_p (struct function *, const_basic_block); extern bool maybe_hot_edge_p (edge); extern bool probably_never_executed_bb_p (struct function *, const_basic_block); +extern bool probably_never_executed_edge_p (struct function *, edge); extern bool optimize_bb_for_size_p (const_basic_block); extern bool optimize_bb_for_speed_p (const_basic_block); extern bool optimize_edge_for_size_p (edge); @@ -797,6 +798,7 @@ extern bool contains_no_active_insn_p (const_basic extern bool forwarder_block_p (const_basic_block); extern bool can_fallthru (basic_block, basic_block); extern void emit_barrier_after_bb (basic_block bb); +extern void fixup_partitions (void); /* In cfgbuild.c. */ extern void find_many_sub_basic_blocks (sbitmap); Index: cfgcleanup.c =================================================================== --- cfgcleanup.c (revision 202021) +++ cfgcleanup.c (working copy) @@ -2807,10 +2807,21 @@ try_optimize_cfg (int mode) df_analyze (); } + if (changed) + { + /* Edge forwarding in particular can cause hot blocks previously + reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. This is not easy + to detect and fix during edge forwarding, and in some cases + is only visible after newly unreachable blocks are deleted, + which will be done in fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING - if (changed) - verify_flow_info (); + verify_flow_info (); #endif + } changed_overall |= changed; first_pass = false; Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 202021) +++ bb-reorder.c (working copy) @@ -1444,27 +1444,157 @@ fix_up_crossing_landing_pad (eh_landing_pad old_lp ei_next (&ei); } + +/* Ensure that all hot bbs are included in a hot path through the + procedure. This is done by calling this function twice, once + with WALK_UP true (to look for paths from the entry to hot bbs) and + once with WALK_UP false (to look for paths from hot bbs to the exit). + Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs + to BBS_IN_HOT_PARTITION. */ + +static unsigned int +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count, + vec<basic_block> *bbs_in_hot_partition) +{ + /* Callers check this. */ + gcc_checking_assert (cold_bb_count); + + /* Keep examining hot bbs while we still have some left to check + and there are remaining cold bbs. */ + vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy (); + while (! hot_bbs_to_check.is_empty () + && cold_bb_count) + { + basic_block bb = hot_bbs_to_check.pop (); + vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs; + edge e; + edge_iterator ei; + int highest_probability = 0; + int highest_freq = 0; + gcov_type highest_count = 0; + bool found = false; + + /* Walk the preds/succs and check if there is at least one already + marked hot. Keep track of the most frequent pred/succ so that we + can mark it hot if we don't find one. */ + FOR_EACH_EDGE (e, ei, edges) + { + basic_block reach_bb = walk_up ? e->src : e->dest; + + if (e->flags & EDGE_DFS_BACK) + continue; + + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) + { + found = true; + break; + } + /* The following loop will look for the hottest edge via + the edge count, if it is non-zero, then fallback to the edge + frequency and finally the edge probability. */ + if (e->count > highest_count) + highest_count = e->count; + int edge_freq = EDGE_FREQUENCY (e); + if (edge_freq > highest_freq) + highest_freq = edge_freq; + if (e->probability > highest_probability) + highest_probability = e->probability; + } + + /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot + block (or unpartitioned, e.g. the entry block) then it is ok. If not, + then the most frequent pred (or succ) needs to be adjusted. In the + case where multiple preds/succs have the same frequency (e.g. a + 50-50 branch), then both will be adjusted. */ + if (found) + continue; + + FOR_EACH_EDGE (e, ei, edges) + { + if (e->flags & EDGE_DFS_BACK) + continue; + /* Select the hottest edge using the edge count, if it is non-zero, + then fallback to the edge frequency and finally the edge + probability. */ + if (highest_count) + { + if (e->count < highest_count) + continue; + } + else if (highest_freq) + { + if (EDGE_FREQUENCY (e) < highest_freq) + continue; + } + else if (e->probability < highest_probability) + continue; + + basic_block reach_bb = walk_up ? e->src : e->dest; + + /* We have a hot bb with an immediate dominator that is cold. + The dominator needs to be re-marked hot. */ + BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION); + cold_bb_count--; + + /* Now we need to examine newly-hot reach_bb to see if it is also + dominated by a cold bb. */ + bbs_in_hot_partition->safe_push (reach_bb); + hot_bbs_to_check.safe_push (reach_bb); + } + } + + return cold_bb_count; +} + + /* Find the basic blocks that are rarely executed and need to be moved to a separate section of the .o file (to cut down on paging and improve cache locality). Return a vector of all edges that cross. */ -static vec<edge> +static vec<edge> find_rarely_executed_basic_blocks_and_crossing_edges (void) { vec<edge> crossing_edges = vNULL; basic_block bb; edge e; edge_iterator ei; + unsigned int cold_bb_count = 0; + vec<basic_block> bbs_in_hot_partition = vNULL; /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { if (probably_never_executed_bb_p (cfun, bb)) - BB_SET_PARTITION (bb, BB_COLD_PARTITION); + { + BB_SET_PARTITION (bb, BB_COLD_PARTITION); + cold_bb_count++; + } else - BB_SET_PARTITION (bb, BB_HOT_PARTITION); + { + BB_SET_PARTITION (bb, BB_HOT_PARTITION); + bbs_in_hot_partition.safe_push (bb); + } } + /* Ensure that hot bbs are included along a hot path from the entry to exit. + Several different possibilities may include cold bbs along all paths + to/from a hot bb. One is that there are edge weight insanities + due to optimization phases that do not properly update basic block profile + counts. The second is that the entry of the function may not be hot, because + it is entered fewer times than the number of profile training runs, but there + is a loop inside the function that causes blocks within the function to be + above the threshold for hotness. This is fixed by walking up from hot bbs + to the entry block, and then down from hot bbs to the exit, performing + partitioning fixups as necessary. */ + if (cold_bb_count) + { + mark_dfs_back_edges (); + cold_bb_count = sanitize_hot_paths (true, cold_bb_count, + &bbs_in_hot_partition); + if (cold_bb_count) + sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition); + } + /* The format of .gcc_except_table does not allow landing pads to be in a different partition as the throw. Fix this by either moving or duplicating the landing pads. */ Index: predict.c =================================================================== --- predict.c (revision 202021) +++ predict.c (working copy) @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun return false; } + +/* Return true in case edge E is probably never executed. */ + +bool +probably_never_executed_edge_p (struct function *fun, edge e) +{ + gcc_checking_assert (fun); + if (profile_info && flag_branch_probabilities) + return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; + if ((!profile_info || !flag_branch_probabilities) + && (cgraph_get_node (fun->decl)->frequency + == NODE_FREQUENCY_UNLIKELY_EXECUTED)) + return true; + return false; +} + /* Return true if NODE should be optimized for size. */ bool Index: cfg.c =================================================================== --- cfg.c (revision 202021) +++ cfg.c (working copy) @@ -446,6 +446,21 @@ check_bb_profile (basic_block bb, FILE * file, int (flags & TDF_COMMENT) ? ";; " : "", s_indent, (int) lsum, (int) bb->count); } + if (BB_PARTITION (bb) == BB_COLD_PARTITION) + { + /* Warn about inconsistencies in the partitioning that are + currently caused by profile insanities created via optimization. */ + if (!probably_never_executed_bb_p (fun, bb)) + fprintf (file, "%s%sBlock in cold partition with hot count\n", + (flags & TDF_COMMENT) ? ";; " : "", s_indent); + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (!probably_never_executed_edge_p (fun, e)) + fprintf (file, + "%s%sBlock in cold partition with incoming hot edge\n", + (flags & TDF_COMMENT) ? ";; " : "", s_indent); + } + } } ^L void -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413