Re: gcov: reduce code quality loss by reproducible topn merging [PR92924]

Jan Hubicka Thu, 13 Feb 2020 05:36:12 -0800

> On 1/30/20 5:13 PM, Jan Hubicka wrote:
> > Martin, I welcome your opinion on the patch
> 
> Ok, recently we've made quite some experiments with Honza about
> TOP N counters. Based on the numbers, it shows that tracking a negative
> value per-counter value does not help much. So that I suggest to use
> only one global flag (a negative total) in order to distinguish in between
> reproducible and non-reproducible profile.
> 
> I'm sending patch that simplifies the merging and introduces a new option.
> 
> For the future, we may (similarly to LLVM) come up with a dynamic allocation
> for TOPN counters. I've git a working patch for that.
> 
> Martin


> From acf77e7ebba2a4f4370388f83901d67cc71e5a0c Mon Sep 17 00:00:00 2001
> From: Martin Liska <mli...@suse.cz>
> Date: Wed, 5 Feb 2020 14:44:43 +0100
> Subject: [PATCH] Introduce -fprofile-reproducible and support it with TOP N.
> 
> gcc/ChangeLog:
> 
> 2020-02-05  Martin Liska  <mli...@suse.cz>
> 
>       PR ipa/92924
>       * common.opt: Add -fprofile-reproducible.
>       * doc/invoke.texi: Document it.
>       * value-prof.c (dump_histogram_value):
>       Document and support behavior for counters[0]
>       being a negative value.
>       (get_nth_most_common_value): Handle negative
>       counters[0] in respect to flag_profile_reproducible.
> 
> libgcc/ChangeLog:
> 
> 2020-02-05  Martin Liska  <mli...@suse.cz>
> 
>       PR ipa/92924
>       * libgcov-merge.c (merge_topn_values_set): Record
>       when a TOP N counter becomes invalid.  When merging
>       remove a smallest value if the space is needed.

I think the patch is fine except for the command line option name.
First the profiling is reproducible as long as you do not run streaming
in parallel and second we will probably eventually run into need of some
kind of reproducibility for multi-threade dprograms.

So what about
 -fprofile-reproducibility=serial
 -fprofile-reproducibility=parallel-runs (or parallel-merging)
and in future
 -fprofile-reproducibility=multithreaded

It will make users to read manual what it odes really mean.
> ---
>  gcc/common.opt         |  4 ++++
>  gcc/doc/invoke.texi    | 12 +++++++++-
>  gcc/value-prof.c       | 26 ++++++++++++++++------
>  libgcc/libgcov-merge.c | 50 +++++++++++++++++++++++++++++-------------
>  4 files changed, 69 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5692cd04374..7f643516a62 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2168,6 +2168,10 @@ fprofile-exclude-files=
>  Common Joined RejectNegative Var(flag_profile_exclude_files)
>  Instrument only functions from files where names do not match all the 
> regular expressions (separated by a semi-colon).
>  
> +fprofile-reproducible
> +Common Report Var(flag_profile_reproducible)
> +Enable only profile based optimizations that do not depend on order of 
> training runs.
> +
>  Enum
>  Name(profile_update) Type(enum profile_update) UnknownError(unknown profile 
> update method %qs)
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 35b341e759f..6725c543c3b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -562,7 +562,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fprofile-abs-path @gol
>  -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} 
> @gol
>  -fprofile-note=@var{path}  -fprofile-update=@var{method} @gol
> --fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} @gol
> +-fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} 
> -fprofile-reproducibly @gol
>  -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} 
> @gol
>  -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... 
> @gol
>  -fsanitize-undefined-trap-on-error  -fbounds-check @gol
> @@ -13324,6 +13324,16 @@ all the regular expressions (separated by a 
> semi-colon).
>  For example, @option{-fprofile-exclude-files=/usr/*} will prevent 
> instrumentation
>  of all files that are located in @file{/usr/} folder.
>  
> +Enable only profile based optimizations (PGO) that do not depend on order of 
> training runs.
> +
> +The PGO optimizations depend on a run-time profile that is always merged 
> after
> +each training run.  The merged profile can end up being different based on
> +sequence of these training runs.  Using the option, the compiler will use
> +only the profile information which cannot be changed by order of training 
> runs.
> +
> +@item -fprofile-reproducible
> +@opindex fprofile-reproducible

Isn't this backwards (i.e. @item first and descrition later).
I think this is bit hard to understand feature so we should explain it
bit more including the surprises one can run into WRT reproducibility.

-fprofile-reproducibility=
Control level of reproducibility of profile gathered by
@code{-fprofile-generate}.  This makes it possible to rebuild program
with same outcome which is useful, for example, for distribution
packages.

With @option{-fprofile-reproducibility=serial} the profile gathered by
@option{-fprofile-generate} is reproducible provided the trained program
behaves the same at each invocation of the train run, it is not
multi-threaded and profile data streaming is always done in the same
order. Note that profile streaming happens at the end of program run but
also before @code{fork} function is invoked.  

Note that it is quite common that execution counts of some part of
programs depends, for example, on length of temporary file names or
memory space randomization (that may affect hash-table collision rate).
Such non-reproducible part of programs may be annotated by
@code{no_instrument_function} function attribute. @code{gcov-dump} with
@option{-l} can be used to dump gathered data and verify that they are
indeed reproducible.

With @option{-fprofile-reproducibility=parallel-runs} collected profile
stays reproducible regardless the order of streaming of the data into
gcda files.  This setting makes it possible to run multiple instances of
instrumente program in parallel (such as with @code{make -j}). This
reduces quality of gathered data, in particular of indirect call
profiling.

Otherwise the patch is OK and thanks for working on this!
Honza
> +
>  @item -fsanitize=address
>  @opindex fsanitize=address
>  Enable AddressSanitizer, a fast memory error detector.
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index f0456c8e93d..64b9c9de6dd 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -265,8 +265,10 @@ dump_histogram_value (FILE *dump_file, histogram_value 
> hist)
>                   ? "Top N value counter" : "Indirect call counter"));
>         if (hist->hvalue.counters)
>           {
> -           fprintf (dump_file, " all: %" PRId64 ", values: ",
> -                    (int64_t) hist->hvalue.counters[0]);
> +           fprintf (dump_file, " all: %" PRId64 "%s, values: ",
> +                    abs ((int64_t) hist->hvalue.counters[0]),
> +                    hist->hvalue.counters[0] < 0
> +                    ? " (values missing)": "");
>             for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
>               {
>                 fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",
> @@ -719,26 +721,36 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
> profile_probability prob,
>  
>  /* Return the n-th value count of TOPN_VALUE histogram.  If
>     there's a value, return true and set VALUE and COUNT
> -   arguments.  */
> +   arguments.
> +
> +   Counters have the following meaning.
> +
> +   abs (counters[0]) is the number of executions
> +   for i in 0 ... TOPN-1
> +     counters[2 * i + 1] is target
> +     abs (counters[2 * i + 2]) is corresponding hitrate counter.
> +
> +   Value of counters[0] negative when counter became
> +   full during merging and some values are lost.  */
>  
>  bool
>  get_nth_most_common_value (gimple *stmt, const char *counter_type,
>                          histogram_value hist, gcov_type *value,
>                          gcov_type *count, gcov_type *all, unsigned n)
>  {
> -  if (hist->hvalue.counters[2] == -1)
> -    return false;
> -
>    gcc_assert (n < GCOV_TOPN_VALUES);
>  
>    *count = 0;
>    *value = 0;
>  
> -  gcov_type read_all = hist->hvalue.counters[0];
> +  gcov_type read_all = abs (hist->hvalue.counters[0]);
>  
>    gcov_type v = hist->hvalue.counters[2 * n + 1];
>    gcov_type c = hist->hvalue.counters[2 * n + 2];
>  
> +  if (flag_profile_reproducible && hist->hvalue.counters[0] < 0)
> +    return false;
> +
>    /* Indirect calls can't be verified.  */
>    if (stmt
>        && check_counter (stmt, counter_type, &c, &read_all,
> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
> index 19b8ee72ae9..c0785b0bf10 100644
> --- a/libgcc/libgcov-merge.c
> +++ b/libgcc/libgcov-merge.c
> @@ -86,36 +86,47 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned 
> n_counters)
>  
>  #ifdef L_gcov_merge_topn
>  
> +/* To merging of TOPN profiles.
> +   counters[0] is the number of executions
> +   for i in 0 ... TOPN-1
> +     counters[2 * i + 1] is target
> +     counters[2 * i + 2] is corresponding hitrate counter.
> +
> +   Because we prune counters only those with probability >= 1/TOPN are
> +   present now.
> +
> +   We use sign of counters[0] to track whether the number of different
> +   targets exceeds TOPN.  */
> +
>  static void
>  merge_topn_values_set (gcov_type *counters)
>  {
>    /* First value is number of total executions of the profiler.  */
> -  gcov_type all = gcov_get_counter_ignore_scaling (-1);
> -  counters[0] += all;
> +  gcov_type all = gcov_get_counter ();
> +  gcov_type *total = &counters[0];
>    ++counters;
>  
> +  /* Negative value means that counter is missing some of values.  */
> +  if (all < 0)
> +    *total = -(*total);
> +
> +  *total += all;
> +
>    /* Read all part values.  */
>    gcov_type read_counters[2 * GCOV_TOPN_VALUES];
> -
>    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
>      {
>        read_counters[2 * i] = gcov_get_counter_target ();
>        read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
>      }
>  
> -  if (read_counters[1] == -1)
> -    {
> -      counters[1] = -1;
> -      return;
> -    }
> -
>    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
>      {
>        if (read_counters[2 * i + 1] == 0)
>       continue;
>  
>        unsigned j;
> -      int slot = -1;
> +      int slot = 0;
>  
>        for (j = 0; j < GCOV_TOPN_VALUES; j++)
>       {
> @@ -124,13 +135,15 @@ merge_topn_values_set (gcov_type *counters)
>             counters[2 * j + 1] += read_counters[2 * i + 1];
>             break;
>           }
> -       else if (counters[2 * j + 1] == 0)
> +       else if (counters[2 * j + 1] < counters[2 * slot + 1])
>           slot = j;
>       }
>  
>        if (j == GCOV_TOPN_VALUES)
>       {
> -       if (slot > 0)
> +       gcov_type slot_count = counters[2 * slot + 1];
> +       /* We found an empty slot.  */
> +       if (slot_count == 0)
>           {
>             /* If we found empty slot, add the value.  */
>             counters[2 * slot] = read_counters[2 * i];
> @@ -138,9 +151,16 @@ merge_topn_values_set (gcov_type *counters)
>           }
>         else
>           {
> -           /* We haven't found a slot, bail out.  */
> -           counters[1] = -1;
> -           return;
> +           /* Here we are loosing some values.  */
> +           if (*total >= 0)
> +             *total = -(*total);
> +           if (read_counters[2 * i + 1] > slot_count)
> +             {
> +               counters[2 * slot] = read_counters[2 * i];
> +               counters[2 * slot + 1] = read_counters[2 * i + 1];
> +             }
> +           else
> +             counters[2 * slot + 1] -= read_counters[2 * i + 1];
>           }
>       }
>      }
> -- 
> 2.25.0
>

Re: gcov: reduce code quality loss by reproducible topn merging [PR92924]

Reply via email to