On 9/21/18 6:05 AM, Bin.Cheng wrote:
> On Thu, Sep 20, 2018 at 6:43 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>>
>>> On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>
>>>>> On Thu, Sep 20, 2018 at 2:11 AM Martin Liška <mli...@suse.cz> wrote:
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> I've been working for some time on a patch that simplifies how we set
>>>>>> the hotness threshold of basic blocks. Currently, we calculate so called
>>>>>> arc profile histograms that should identify edges that cover 99.9% of all
>>>>>> branching. These edges are then identified as hot. Disadvantage of the 
>>>>>> approach
>>>>>> is that it comes with significant overhead in run-time and GCC related 
>>>>>> code
>>>>>> is also not trivial. Moreover, anytime a histogram is merged after an 
>>>>>> instrumented
>>>>>> run, the resulting histogram is misleading.
>>>>>>
>>>>>> That said, I decided to simplify it again, remove usage of the histogram 
>>>>>> and return
>>>>>> to what we have before (--param hot-bb-count-fraction). That basically 
>>>>>> says that
>>>>>> we consider hot each edge that has execution count bigger than sum_max / 
>>>>>> 10.000.
>>>>>>
>>>>>> Note that LTO+PGO remains untouched as it still uses histogram that is 
>>>>>> dynamically
>>>>>> calculated by read arc counts.
>>>>> Hi,
>>>>> Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
>>>>> now, on the basis of current code.
>>>>
>>>> This is indpendent of Auto-FDO. There we probably can define cutoffs for 
>>>> hot-cold
>>>> partitions in the tool translating global data into per-file data read by 
>>>> GCC.
>>>> It is great you will take a deper look at autoFDO. it indeed needs work!
>>>>
>>>> The patch is OK, thank for working on it!  Histograms was added by google 
>>>> as
>>>> bit of experiment, but I do not think they turned out to be useful. The 
>>>> data
>>> I did some experiments showing it is somehow useful, for autoFDO.  To
>>> which extend it is useful remains a question I need to investigate
>>> later.
>>
>> Indeed auto-FDO has better idea about whole program behaviour. We could 
>> revive
>> the patch for streaming histograms and reading them to compiler if that turns
>> out to be a good idea. I can see that auto-FDO profile data tells you pretty
>> clearly where the hot spots are and it is not as easy to recover this 
>> information
>> from profile annotated CFG becuase of all the transforms we do.
>> Lets fix and benchmark auto-FDO first and then we could decide what is best 
>> option.
>> Putting the stream-in code back should not be hard if it turns out to be 
>> useful.
>>
>> Main problem with current historams with normal FDO is the fact that you need
>> to merge them between runs which is technically impossible job to do, so they
>> work for programs run once, but not for programs run many times in train runs
>> like gcc itself.  It seems to me that for those relaly interested in
>> performance it is good idea to switch to LTO and that makes it possible to
>> calculate histograms during the linking stage.
> 
> honza, thanks very much for detailed explanation.

Thanks Honza for review.

Bin: Do not hesitate and ask me about what you'll need. As Honza mentioned
I don't see any problem with propagating hotness information from AutoFDO into
*.gcda files. Format can be discussed.

Martin

> 
> Thanks,
> bin
>>
>> Honza
>>>
>>> Thanks,
>>> bin
>>>> produced by them was not very related to what the IPA profile generation 
>>>> produces
>>>> and thus it did not seem to match reality very well.
>>>>
>>>> Honza
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>>>
>>>>>> Note the statistics of the patch:
>>>>>>   19 files changed, 101 insertions(+), 1216 deletions(-)
>>>>>>
>>>>>> I'm attaching file sizes of SPEC2006 int benchmark.
>>>>>>
>>>>>> Patch survives testing on x86_64-linux-gnu machine.
>>>>>> Ready to be installed?
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2018-09-19  Martin Liska  <mli...@suse.cz>
>>>>>>
>>>>>>         * auto-profile.c (autofdo_source_profile::read): Do not
>>>>>>         set sum_all.
>>>>>>         (read_profile): Do not add working sets.
>>>>>>         (read_autofdo_file): Remove sum_all.
>>>>>>         (afdo_callsite_hot_enough_for_early_inline): Remove const
>>>>>>         qualifier.
>>>>>>         * coverage.c (struct counts_entry): Remove gcov_summary.
>>>>>>         (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
>>>>>>         do not support GCOV_TAG_PROGRAM_SUMMARY.
>>>>>>         (get_coverage_counts): Remove summary and expected
>>>>>>         arguments.
>>>>>>         * coverage.h (get_coverage_counts): Likewise.
>>>>>>         * doc/gcov-dump.texi: Remove -w option.
>>>>>>         * gcov-dump.c (dump_working_sets): Remove.
>>>>>>         (main): Do not support '-w' option.
>>>>>>         (print_usage): Likewise.
>>>>>>         (tag_summary): Likewise.
>>>>>>         * gcov-io.c (gcov_write_summary): Do not dump
>>>>>>         histogram.
>>>>>>         (gcov_read_summary): Likewise.
>>>>>>         (gcov_histo_index): Remove.
>>>>>>         (gcov_histogram_merge): Likewise.
>>>>>>         (compute_working_sets): Likewise.
>>>>>>         * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
>>>>>>         it not obsolete.
>>>>>>         (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
>>>>>>         (GCOV_TAG_SUMMARY_LENGTH): Adjust.
>>>>>>         (GCOV_HISTOGRAM_SIZE): Remove.
>>>>>>         (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
>>>>>>         (struct gcov_summary): Simplify rapidly just
>>>>>>         to runs and sum_max fields.
>>>>>>         (gcov_histo_index): Remove.
>>>>>>         (NUM_GCOV_WORKING_SETS): Likewise.
>>>>>>         (compute_working_sets): Likewise.
>>>>>>         * gcov-tool.c (print_overlap_usage_message): Remove
>>>>>>         trailing empty line.
>>>>>>         * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY.
>>>>>>         (output_lines): Remove program related line.
>>>>>>         * ipa-profile.c (ipa_profile): Do not consider GCOV histogram.
>>>>>>         * lto-cgraph.c (output_profile_summary): Do not stream GCOV
>>>>>>         histogram.
>>>>>>         (input_profile_summary): Do not read it.
>>>>>>         (merge_profile_summaries): And do not merge it.
>>>>>>         (input_symtab): Do not call removed function.
>>>>>>         * modulo-sched.c (sms_schedule): Do not print sum_max.
>>>>>>         * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was
>>>>>>         removed when histogram method was invented.
>>>>>>         (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO
>>>>>>         mode.
>>>>>>         * postreload-gcse.c (eliminate_partially_redundant_load): Fix
>>>>>>         GCOV coding style.
>>>>>>         * predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION
>>>>>>         and dump selected value.
>>>>>>         * profile.c (add_working_set): Remove.
>>>>>>         (get_working_sets): Likewise.
>>>>>>         (find_working_set): Likewise.
>>>>>>         (get_exec_counts): Do not work with working sets.
>>>>>>         (read_profile_edge_counts): Do not inform as sum_max is removed.
>>>>>>         (compute_branch_probabilities): Likewise.
>>>>>>         (compute_value_histograms): Remove argument for call of
>>>>>>         get_coverage_counts.
>>>>>>         * profile.h: Do not make gcov_summary const.
>>>>>>
>>>>>> libgcc/ChangeLog:
>>>>>>
>>>>>> 2018-09-19  Martin Liska  <mli...@suse.cz>
>>>>>>
>>>>>>         * libgcov-driver.c (crc32_unsigned): Remove.
>>>>>>         (gcov_histogram_insert): Likewise.
>>>>>>         (gcov_compute_histogram): Likewise.
>>>>>>         (compute_summary): Simplify rapidly.
>>>>>>         (merge_one_data): Do not handle PROGRAM_SUMMARY tag.
>>>>>>         (merge_summary): Rapidly simplify.
>>>>>>         (dump_one_gcov): Ignore gcov_summary.
>>>>>>         (gcov_do_dump): Do not handle program summary, it's not
>>>>>>         used.
>>>>>>         * libgcov-util.c (tag_summary): Remove.
>>>>>>         (read_gcda_finalize): Fix coding style.
>>>>>>         (read_gcda_file): Initialize curr_object_summary.
>>>>>>         (compute_summary): Remove.
>>>>>>         (calculate_overlap): Remove settings of run_max.
>>>>>> ---
>>>>>>   gcc/auto-profile.c      |  21 +--
>>>>>>   gcc/coverage.c          |  59 +-----
>>>>>>   gcc/coverage.h          |   4 +-
>>>>>>   gcc/doc/gcov-dump.texi  |   6 +-
>>>>>>   gcc/gcov-dump.c         |  81 +-------
>>>>>>   gcc/gcov-io.c           | 398 +---------------------------------------
>>>>>>   gcc/gcov-io.h           |  71 +------
>>>>>>   gcc/gcov-tool.c         |   1 -
>>>>>>   gcc/gcov.c              |   7 +-
>>>>>>   gcc/ipa-profile.c       |  26 +--
>>>>>>   gcc/lto-cgraph.c        | 136 +-------------
>>>>>>   gcc/modulo-sched.c      |   8 -
>>>>>>   gcc/params.def          |   7 +-
>>>>>>   gcc/postreload-gcse.c   |   2 +-
>>>>>>   gcc/predict.c           |   9 +-
>>>>>>   gcc/profile.c           | 116 +-----------
>>>>>>   gcc/profile.h           |   2 +-
>>>>>>   libgcc/libgcov-driver.c | 324 ++++----------------------------
>>>>>>   libgcc/libgcov-util.c   |  39 +---
>>>>>>   19 files changed, 101 insertions(+), 1216 deletions(-)
>>>>>>
>>>>>>

Reply via email to