On Sat, Aug 18, 2012 at 1:19 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > > > + { > > + cs_prg->num = cs_tprg->num; > > + /* Allocate the working set array for the merged > > summary. */ > > + if (ws_cnt) > > + { > > + cs_prg->working_set_count = ws_cnt; > > + cs_prg->working_sets = (gcov_ws_info_t *) malloc ( > > + ws_cnt * sizeof (gcov_ws_info_t)); > > + } > > + } > > + else if (cs_prg->num != cs_tprg->num > > + || ws_cnt != cs_prg->working_set_count) > > + goto read_mismatch; > > + /* Copy over this run's working set information if either > > this is > > + the first run, the total size of the profile (sum_all) is > > much > > + (50%) larger for this run (need to check this before > > updating > > + cs_prg->sum_all below), or this run has a larger working > > + set in the largest (99.99% of sum_all) bucket. */ > > + if (ws_cnt > > + && (cs_prg->runs == 1 > > + || (cs_tprg->sum_all > > + > (cs_prg->sum_all + cs_prg->sum_all / 2)) > > + || (cs_tprg->working_sets[ws_cnt - 1].num_counters > > + > cs_prg->working_sets[ws_cnt - > > 1].num_counters))) > > + memcpy (cs_prg->working_sets, > > + cs_tprg->working_sets, > > + ws_cnt * sizeof (gcov_ws_info_t)); > > cs_prg->sum_all += cs_tprg->sum_all; > > Hmm, when you run i.e. gcc bootstrap where the program is run couple hundred > times, this will end up really inaccurate, since it will probably just store > the histogram of insn-attrtab compilation, right?
Well, it should store the largest working set in BBs, or one that came from a much longer run. But I see the issue you are pointing out. The num_counters (the number of hot bbs) should be reasonable, as the total number of BBs is the same between all runs, and I want to show the largest or more dominant working set in terms of the number of hot bbs. But the min_bb_counter will become more and more inaccurate as the number of runs increases, since the counter values are accumulated. I typed up a detailed email below on why getting this right would be difficult, but then I realized there might be a fairly simple accurate solution, which I'll describe first: The only way I see to do this completely accurately is to take two passes through the existing gcda files that we are merging into, one to read in all the counter values and merge them into all the counter values in the arrays from the current run (after which we can do the histogramming/working set computation accurately from the merged counters), and the second to rewrite them. In libgcov this doesn't seem like it would be too difficult to do, although it is a little bit of restructuring of the main merging loop and needs some special handling for buffered functions (which could increase the memory footprint a bit if there are many of these since they will all need to be buffered across the iteration over all the gcda files). The summary merging in coverage.c confuses me a bit as it seems to be handling the case when there are multiple program summaries in a single gcda file. When would this happen? It looks like the merge handling in libgcov should always produce a single program summary per gcda file. > > > Why you don't simply write the histogram into gcov file and don't merge the > values > here (i.e. doing the cummulation loop in GCC instead of within libgcov)? That doesn't completely solve the problem, unfortunately. The reason is that you don't know which histogram entry contains a particular block each run, and therefore you don't know how to compute the new combined histogram value and index for that bb counter. For example, a particular histogram index may have 5 counters (bbs) in it for one run and 2 counters (bbs) in it for a second run, so the question is how to compute the new entry for all of those bb counters, as the 5 bbs from the first run may or may not be a superset of the 2 from the second run. You could assume that the bbs have the same relative order of hotness between runs, and combine the bb counters accordingly, but there is still some trickiness because you have to apportion the cumulative counter sum stored in the histogram entry between new entries. For example, assume the highest indexed non-zero entries (the histogram buckets containing the largest counter values) in the two incoming histograms are: histogram 1: index 100: 4 counters, cumulative value X, min counter value minx ... histogram 2: index 100: 2 counters, cumulative value Y, min counter value miny index 99: 3 counters, cumulative value W, min counter value minw ... To merge, you could assume something like 2 counters with a new cumulative value (Y + X*2/4), and new min counter value minx+miny, that go into the merged histogram with the index corresponding to counter value minx+miny. And then 2 counters have a new cumulative value (W*2/3 + X*2/4) and new min counter value minx+minw, that go into the merged histogram with index corresponding to counter value minw+minx. Etc... Not entirely accurate, although it might be a reasonable approximation, but it also requires a number of division operations during the merge in libgcov. Another possibility, that might also provide a reasonable approximation, would be to scale the min_bb_counter values in the working sets by the sum_all_merged/sum_all_orig, where sum_all_merged is the new sum_all, and sum_all_orig is the sum_all from the summary whose working_set was chosen to be propagated to the new merged summary. This also requires some divisions at libgcov merge time, unless we save the original sum_all along with the working set in the summary and do the scaling at profile-use time. > By default you are streaming 128 values that is the same as needed to stream > the histogram. > I suppose we can have environment variable to reduce the histogram size - I > guess in smaller > setups smaller histogram will run just fine... It is a bit more, as the histogram will have 64*4 = 256 entries for 64-bit counters, and each entry contains 2 gcov_type counter values and one unsigned int. The working set entries only have one gcov_type counter and one unsigned. So it could be close to 4x. What do you think? Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413