On Thu, Oct 3, 2013 at 6:41 AM, Teresa Johnson <tejohn...@google.com> wrote: > On Wed, Oct 2, 2013 at 9:19 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> 2013-09-29 Teresa Johnson <tejohn...@google.com> >>> >>> * bb-reorder.c >>> (find_rarely_executed_basic_blocks_and_crossing_edges): >>> Treat profile insanities conservatively. >>> * predict.c (probably_never_executed): New function. Treat profile >>> insanities conservatively. >>> (probably_never_executed_bb_p): Invoke probably_never_executed. >>> (probably_never_executed_edge_p): Invoke probably_never_executed. >>> >>> Index: bb-reorder.c >>> =================================================================== >>> --- bb-reorder.c (revision 202947) >>> +++ bb-reorder.c (working copy) >>> @@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>> FOR_EACH_BB (bb) >>> { >>> + bool cold_bb = false; >> >> whitespace here >> >>> if (probably_never_executed_bb_p (cfun, bb)) >>> { >>> + /* Handle profile insanities created by upstream optimizations >>> + by also checking the incoming edge weights. If there is a >>> non-cold >>> + incoming edge, conservatively prevent this block from being >>> split >>> + into the cold section. */ >>> + cold_bb = true; >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> + { >>> + if (!probably_never_executed_edge_p (cfun, e)) >>> + { >>> + cold_bb = false; >>> + break; >>> + } >>> + } >> >> You can probably elimnate the extra braces. >> So we won't propagate deeper in the CFG, right? >> >> This change is OK. >> >>> + } >>> + if (cold_bb) >>> + { >>> BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> cold_bb_count++; >>> } >>> Index: predict.c >>> =================================================================== >>> --- predict.c (revision 202947) >>> +++ predict.c (working copy) >>> @@ -226,26 +226,26 @@ maybe_hot_edge_p (edge e) >>> } >>> >>> >>> -/* Return true in case BB is probably never executed. */ >>> >>> -bool >>> -probably_never_executed_bb_p (struct function *fun, const_basic_block bb) >>> +/* Return true if profile COUNT and FREQUENCY, or function FUN static >>> + node frequency reflects never being executed. */ >>> + >>> +static bool >>> +probably_never_executed (struct function *fun, >>> + gcov_type count, int frequency) >>> { >>> gcc_checking_assert (fun); >>> if (profile_status_for_function (fun) == PROFILE_READ) >>> { >>> - if ((bb->count * 4 + profile_info->runs / 2) / profile_info->runs > >>> 0) >>> + if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) >>> return false; >>> - if (!bb->frequency) >>> - return true; >>> - if (!ENTRY_BLOCK_PTR->frequency) >>> - return false; >>> - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < >>> REG_BR_PROB_BASE) >>> - { >>> - return (RDIV (bb->frequency * ENTRY_BLOCK_PTR->count, >>> - ENTRY_BLOCK_PTR->frequency) >>> - < REG_BR_PROB_BASE / 4); >>> - } >>> + // If this is a profiled function (entry bb non-zero count), then >>> base >>> + // the coldness decision on the frequency. This will handle cases >>> where >>> + // counts are not updated properly during optimizations or expansion. >>> + if (ENTRY_BLOCK_PTR->count) >>> + return frequency == 0; >>> + // Unprofiled function, frequencies statically assigned. All bbs are >>> + // treated as cold. >> >> I would avoid combining C and C++ comments in the function. >> Did you get some data on how many basic blocks we now consider hot? >> >> The previous implemntation consdered block as never executed when frequencies >> indicates that it is executed in at most 1/4th of invocations of program. >> You essentially chnage to 1/10000. The first seems bit too high given the >> way we distribute probabilities in dojump and firends, second looks too low. > > Actually, I don't think the current code is detecting when the > frequencies indicate it was executed 1/4 time. The current code takes > a ratio of the entry block count, and compares it to > REG_BR_PROB_BASE/4, which seems like the wrong comparison for profile > counts. Shouldn't this be something like: > > gcov_type computed_count = RDIV (frequency * ENTRY_BLOCK_PTR->count, > ENTRY_BLOCK_PTR->frequency) > if ((computed_count * 4 + profile_info->runs / 2) / profile_info->runs > 0) > return false; > return true; > > i.e. do the same check we do for bb->count above. And the check > guarding this is looking for ENTRY_BLOCK_PTR->count < > REG_BR_PROB_BASE, which also doesn't seem right, although we need to > ensure that we don't overflow when multiplying by frequency.
I have a variant of this change that works well, and has the proper overflow checking. This gets rid of 13 failures when there is 1 train run. Changing the required ratio to 1/100 training runs instead of 1/4 reduces the single train run failures by another 5, since smaller frequencies are handled even when the count has been truncated to 0. I found a couple more failures with 1 train run were due to inlining profile update issues, which I have a fix for. At that point, the failures are the same between the 1 train run and 100 train run cases. After that, there are 2 remaining failures (both with 1 train and 100 train runs), that go away when I disable loop unrolling, that I haven't looked at yet. I'll send a patch with the above changes though hopefully tonight. What do you think of changing the required execution count to profile run ratio to 1/100? Teresa > > Teresa > >> >> The change introducing probably_never_executed with the current logic is OK. >> We may want to fine tune the ratio. >> >> Honza >>> return true; >>> } >>> if ((!profile_info || !flag_branch_probabilities) >>> @@ -256,19 +256,21 @@ maybe_hot_edge_p (edge e) >>> } >>> >>> >>> +/* Return true in case BB is probably never executed. */ >>> + >>> +bool >>> +probably_never_executed_bb_p (struct function *fun, const_basic_block bb) >>> +{ >>> + return probably_never_executed (fun, bb->count, bb->frequency); >>> +} >>> + >>> + >>> /* Return true in case edge E is probably never executed. */ >>> >>> bool >>> probably_never_executed_edge_p (struct function *fun, edge e) >>> { >>> - gcc_checking_assert (fun); >>> - if (profile_info && flag_branch_probabilities) >>> - return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; >>> - if ((!profile_info || !flag_branch_probabilities) >>> - && (cgraph_get_node (fun->decl)->frequency >>> - == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >>> - return true; >>> - return false; >>> + return probably_never_executed (fun, e->count, EDGE_FREQUENCY (e)); >>> } >>> >>> /* Return true if NODE should be optimized for size. */ > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413