On 9/21/18 6:05 AM, Bin.Cheng wrote: > On Thu, Sep 20, 2018 at 6:43 PM Jan Hubicka <hubi...@ucw.cz> wrote: >> >>> On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka <hubi...@ucw.cz> wrote: >>>> >>>>> On Thu, Sep 20, 2018 at 2:11 AM Martin Liška <mli...@suse.cz> wrote: >>>>>> >>>>>> Hello. >>>>>> >>>>>> I've been working for some time on a patch that simplifies how we set >>>>>> the hotness threshold of basic blocks. Currently, we calculate so called >>>>>> arc profile histograms that should identify edges that cover 99.9% of all >>>>>> branching. These edges are then identified as hot. Disadvantage of the >>>>>> approach >>>>>> is that it comes with significant overhead in run-time and GCC related >>>>>> code >>>>>> is also not trivial. Moreover, anytime a histogram is merged after an >>>>>> instrumented >>>>>> run, the resulting histogram is misleading. >>>>>> >>>>>> That said, I decided to simplify it again, remove usage of the histogram >>>>>> and return >>>>>> to what we have before (--param hot-bb-count-fraction). That basically >>>>>> says that >>>>>> we consider hot each edge that has execution count bigger than sum_max / >>>>>> 10.000. >>>>>> >>>>>> Note that LTO+PGO remains untouched as it still uses histogram that is >>>>>> dynamically >>>>>> calculated by read arc counts. >>>>> Hi, >>>>> Does this affect AutoFDO stuff? AutoFDO is broken and I am fixing it >>>>> now, on the basis of current code. >>>> >>>> This is indpendent of Auto-FDO. There we probably can define cutoffs for >>>> hot-cold >>>> partitions in the tool translating global data into per-file data read by >>>> GCC. >>>> It is great you will take a deper look at autoFDO. it indeed needs work! >>>> >>>> The patch is OK, thank for working on it! Histograms was added by google >>>> as >>>> bit of experiment, but I do not think they turned out to be useful. The >>>> data >>> I did some experiments showing it is somehow useful, for autoFDO. To >>> which extend it is useful remains a question I need to investigate >>> later. >> >> Indeed auto-FDO has better idea about whole program behaviour. We could >> revive >> the patch for streaming histograms and reading them to compiler if that turns >> out to be a good idea. I can see that auto-FDO profile data tells you pretty >> clearly where the hot spots are and it is not as easy to recover this >> information >> from profile annotated CFG becuase of all the transforms we do. >> Lets fix and benchmark auto-FDO first and then we could decide what is best >> option. >> Putting the stream-in code back should not be hard if it turns out to be >> useful. >> >> Main problem with current historams with normal FDO is the fact that you need >> to merge them between runs which is technically impossible job to do, so they >> work for programs run once, but not for programs run many times in train runs >> like gcc itself. It seems to me that for those relaly interested in >> performance it is good idea to switch to LTO and that makes it possible to >> calculate histograms during the linking stage. > > honza, thanks very much for detailed explanation.
Thanks Honza for review. Bin: Do not hesitate and ask me about what you'll need. As Honza mentioned I don't see any problem with propagating hotness information from AutoFDO into *.gcda files. Format can be discussed. Martin > > Thanks, > bin >> >> Honza >>> >>> Thanks, >>> bin >>>> produced by them was not very related to what the IPA profile generation >>>> produces >>>> and thus it did not seem to match reality very well. >>>> >>>> Honza >>>>> >>>>> Thanks, >>>>> bin >>>>>> >>>>>> Note the statistics of the patch: >>>>>> 19 files changed, 101 insertions(+), 1216 deletions(-) >>>>>> >>>>>> I'm attaching file sizes of SPEC2006 int benchmark. >>>>>> >>>>>> Patch survives testing on x86_64-linux-gnu machine. >>>>>> Ready to be installed? >>>>>> >>>>>> Martin >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> 2018-09-19 Martin Liska <mli...@suse.cz> >>>>>> >>>>>> * auto-profile.c (autofdo_source_profile::read): Do not >>>>>> set sum_all. >>>>>> (read_profile): Do not add working sets. >>>>>> (read_autofdo_file): Remove sum_all. >>>>>> (afdo_callsite_hot_enough_for_early_inline): Remove const >>>>>> qualifier. >>>>>> * coverage.c (struct counts_entry): Remove gcov_summary. >>>>>> (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY, >>>>>> do not support GCOV_TAG_PROGRAM_SUMMARY. >>>>>> (get_coverage_counts): Remove summary and expected >>>>>> arguments. >>>>>> * coverage.h (get_coverage_counts): Likewise. >>>>>> * doc/gcov-dump.texi: Remove -w option. >>>>>> * gcov-dump.c (dump_working_sets): Remove. >>>>>> (main): Do not support '-w' option. >>>>>> (print_usage): Likewise. >>>>>> (tag_summary): Likewise. >>>>>> * gcov-io.c (gcov_write_summary): Do not dump >>>>>> histogram. >>>>>> (gcov_read_summary): Likewise. >>>>>> (gcov_histo_index): Remove. >>>>>> (gcov_histogram_merge): Likewise. >>>>>> (compute_working_sets): Likewise. >>>>>> * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark >>>>>> it not obsolete. >>>>>> (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete. >>>>>> (GCOV_TAG_SUMMARY_LENGTH): Adjust. >>>>>> (GCOV_HISTOGRAM_SIZE): Remove. >>>>>> (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise. >>>>>> (struct gcov_summary): Simplify rapidly just >>>>>> to runs and sum_max fields. >>>>>> (gcov_histo_index): Remove. >>>>>> (NUM_GCOV_WORKING_SETS): Likewise. >>>>>> (compute_working_sets): Likewise. >>>>>> * gcov-tool.c (print_overlap_usage_message): Remove >>>>>> trailing empty line. >>>>>> * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY. >>>>>> (output_lines): Remove program related line. >>>>>> * ipa-profile.c (ipa_profile): Do not consider GCOV histogram. >>>>>> * lto-cgraph.c (output_profile_summary): Do not stream GCOV >>>>>> histogram. >>>>>> (input_profile_summary): Do not read it. >>>>>> (merge_profile_summaries): And do not merge it. >>>>>> (input_symtab): Do not call removed function. >>>>>> * modulo-sched.c (sms_schedule): Do not print sum_max. >>>>>> * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was >>>>>> removed when histogram method was invented. >>>>>> (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO >>>>>> mode. >>>>>> * postreload-gcse.c (eliminate_partially_redundant_load): Fix >>>>>> GCOV coding style. >>>>>> * predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION >>>>>> and dump selected value. >>>>>> * profile.c (add_working_set): Remove. >>>>>> (get_working_sets): Likewise. >>>>>> (find_working_set): Likewise. >>>>>> (get_exec_counts): Do not work with working sets. >>>>>> (read_profile_edge_counts): Do not inform as sum_max is removed. >>>>>> (compute_branch_probabilities): Likewise. >>>>>> (compute_value_histograms): Remove argument for call of >>>>>> get_coverage_counts. >>>>>> * profile.h: Do not make gcov_summary const. >>>>>> >>>>>> libgcc/ChangeLog: >>>>>> >>>>>> 2018-09-19 Martin Liska <mli...@suse.cz> >>>>>> >>>>>> * libgcov-driver.c (crc32_unsigned): Remove. >>>>>> (gcov_histogram_insert): Likewise. >>>>>> (gcov_compute_histogram): Likewise. >>>>>> (compute_summary): Simplify rapidly. >>>>>> (merge_one_data): Do not handle PROGRAM_SUMMARY tag. >>>>>> (merge_summary): Rapidly simplify. >>>>>> (dump_one_gcov): Ignore gcov_summary. >>>>>> (gcov_do_dump): Do not handle program summary, it's not >>>>>> used. >>>>>> * libgcov-util.c (tag_summary): Remove. >>>>>> (read_gcda_finalize): Fix coding style. >>>>>> (read_gcda_file): Initialize curr_object_summary. >>>>>> (compute_summary): Remove. >>>>>> (calculate_overlap): Remove settings of run_max. >>>>>> --- >>>>>> gcc/auto-profile.c | 21 +-- >>>>>> gcc/coverage.c | 59 +----- >>>>>> gcc/coverage.h | 4 +- >>>>>> gcc/doc/gcov-dump.texi | 6 +- >>>>>> gcc/gcov-dump.c | 81 +------- >>>>>> gcc/gcov-io.c | 398 +--------------------------------------- >>>>>> gcc/gcov-io.h | 71 +------ >>>>>> gcc/gcov-tool.c | 1 - >>>>>> gcc/gcov.c | 7 +- >>>>>> gcc/ipa-profile.c | 26 +-- >>>>>> gcc/lto-cgraph.c | 136 +------------- >>>>>> gcc/modulo-sched.c | 8 - >>>>>> gcc/params.def | 7 +- >>>>>> gcc/postreload-gcse.c | 2 +- >>>>>> gcc/predict.c | 9 +- >>>>>> gcc/profile.c | 116 +----------- >>>>>> gcc/profile.h | 2 +- >>>>>> libgcc/libgcov-driver.c | 324 ++++---------------------------- >>>>>> libgcc/libgcov-util.c | 39 +--- >>>>>> 19 files changed, 101 insertions(+), 1216 deletions(-) >>>>>> >>>>>>