> Great! Is this the LTO merging you were talking about in an earlier > message, or the gcov runtime fix (that would presumably not be > lto-specific)?
It is the LTO path - we need to merge profiles there anyway for his code unification work. > > I have patch to track this. Moreover vforks seems to produce repeated > > merging of results. > > Aha, does this explain the gimp issue as well? Not really - we still need to debug why we hit cold section so many times with partitioning. I sitll think easier approach will be to lock the cold section and then start probably with testsuite (i.e. write script to compile the small testcases with FDO + partitioning and see what crash by hitting cold section). > > > > Is it really necessary to run this from internal loop of the cfgcleanup? > > The reason I added it here is that just below there is a call to > verify_flow_info, and that will fail with the new verification. Hmm, OK, I suppose we run the cleanup after partitioning just once or twice, right? We can track this incrementally - I am not sure if we put it from the internal iteration loop we would get anything substantial either. Removing unreachable blocks twice is however ugly. > +/* Ensure that all hot bbs are included in a hot path through the > + procedure. This is done by calling this function twice, once > + with WALK_UP true (to look for paths from the entry to hot bbs) and > + once with WALK_UP false (to look for paths from hot bbs to the exit). > + Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs > + to BBS_IN_HOT_PARTITION. */ > + > +static unsigned int > +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count, > + vec<basic_block> *bbs_in_hot_partition) > +{ > + /* Callers check this. */ > + gcc_checking_assert (cold_bb_count); > + > + /* Keep examining hot bbs while we still have some left to check > + and there are remaining cold bbs. */ > + vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy (); > + while (! hot_bbs_to_check.is_empty () > + && cold_bb_count) > + { > + basic_block bb = hot_bbs_to_check.pop (); > + vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs; > + edge e; > + edge_iterator ei; > + int highest_probability = 0; > + int highest_freq = 0; > + gcov_type highest_count = 0; > + bool found = false; > + > + /* Walk the preds/succs and check if there is at least one already > + marked hot. Keep track of the most frequent pred/succ so that we > + can mark it hot if we don't find one. */ > + FOR_EACH_EDGE (e, ei, edges) > + { > + basic_block reach_bb = walk_up ? e->src : e->dest; > + > + if (e->flags & EDGE_DFS_BACK) > + continue; > + > + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) > + { > + found = true; > + break; > + } > + /* The following loop will look for the hottest edge via > + the edge count, if it is non-zero, then fallback to the edge > + frequency and finally the edge probability. */ > + if (e->count > highest_count) > + highest_count = e->count; > + int edge_freq = EDGE_FREQUENCY (e); > + if (edge_freq > highest_freq) > + highest_freq = edge_freq; > + if (e->probability > highest_probability) > + highest_probability = e->probability; > + } > + > + /* If bb is reached by (or reaches, in the case of !WALK_UP) another > hot > + block (or unpartitioned, e.g. the entry block) then it is ok. If > not, > + then the most frequent pred (or succ) needs to be adjusted. In the > + case where multiple preds/succs have the same frequency (e.g. a > + 50-50 branch), then both will be adjusted. */ > + if (found) > + continue; > + > + FOR_EACH_EDGE (e, ei, edges) > + { > + if (e->flags & EDGE_DFS_BACK) > + continue; > + /* Select the hottest edge using the edge count, if it is non-zero, > + then fallback to the edge frequency and finally the edge > + probability. */ > + if (highest_count) > + { > + if (e->count < highest_count) > + continue; > + } > + else if (highest_freq) The frequency condition needs to be done only when you walk predecestors - when you walk down the edge probabilities are just fine. The patch seems OK to me now. I will make our FDO tester to use partitioning so we get this benchmarked a bit. Honza