Re: [PATCH 01/22] Add condition coverage profiling

Jan Hubicka Thu, 05 Oct 2023 06:00:02 -0700

> 
> Like Wahlen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This is very fast, but
> introduces a limit on the number of terms in a single boolean
> expression, the number of bits in a gcov_unsigned_type (which is
> typedef'd to uint64_t), so for most practical purposes this would be
> acceptable. This limitation is in the implementation and not the
> algorithm, so support for more conditions can be added by also
> introducing arbitrary-sized bitsets.


This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?
> 
> For space overhead, the instrumentation needs two accumulators
> (gcov_unsigned_type) per condition in the program which will be written
> to the gcov file. In addition, every function gets a pair of local
> accumulators, but these accmulators are reused between conditions in the
> same function.
> 
> For time overhead, there is a zeroing of the local accumulators for
> every condition and one or two bitwise operation on every edge taken in
> the an expression.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
>         3:   17:void fn (int a, int b, int c, int d) {
>         3:   18:    if ((a && (b || c)) && d)
> conditions covered 3/8
> condition  0 not covered (true)
> condition  0 not covered (false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?
> 
> Some expressions, mostly those without else-blocks, are effectively
> "rewritten" in the CFG construction making the algorithm unable to
> distinguish them:
> 
> and.c:
> 
>     if (a && b && c)
>         x = 1;
> 
> ifs.c:
> 
>     if (a)
>         if (b)
>             if (c)
>                 x = 1;
> 
> gcc will build the same graph for both these programs, and gcov will
> report boths as 3-term expressions. It is vital that it is not
> interpreted the other way around (which is consistent with the shape of
> the graph) because otherwise the masking would be wrong for the and.c
> program which is a more severe error. While surprising, users would
> probably expect some minor rewriting of semantically-identical
> expressions.
> 
> and.c.gcov:
>     #####:    2:    if (a && b && c)
> conditions covered 6/6
>     #####:    3:        x = 1;
> 
> ifs.c.gcov:
>     #####:    2:    if (a)
>     #####:    3:        if (b)
>     #####:    4:            if (c)
>     #####:    5:                x = 1;
> conditions covered 6/6

Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used 
for
auto-FDO?
> 
> gcc/ChangeLog:
> 
>       * builtins.cc (expand_builtin_fork_or_exec): Check
>       profile_condition_flag.
>         * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
>       * common.opt: Add new options -fprofile-conditions and
>       * doc/gcov.texi: Add --conditions documentation.
>       * doc/invoke.texi: Add -fprofile-conditions documentation.
>       * gcc.cc: Link gcov on -fprofile-conditions.
>       * gcov-counter.def (GCOV_COUNTER_CONDS): New.
>       * gcov-dump.cc (tag_conditions): New.
>       * gcov-io.h (GCOV_TAG_CONDS): New.
>       (GCOV_TAG_CONDS_LENGTH): Likewise.
>       (GCOV_TAG_CONDS_NUM): Likewise.
>       * gcov.cc (class condition_info): New.
>       (condition_info::condition_info): New.
>       (condition_info::popcount): New.
>       (struct coverage_info): New.
>       (add_condition_counts): New.
>       (output_conditions): New.
>       (print_usage): Add -g, --conditions.
>       (process_args): Likewise.
>       (output_intermediate_json_line): Output conditions.
>       (read_graph_file): Read conditions counters.
>       (read_count_file): Read conditions counters.
>       (file_summary): Print conditions.
>       (accumulate_line_info): Accumulate conditions.
>       (output_line_details): Print conditions.
>       * ipa-inline.cc (can_early_inline_edge_p): Check
>       profile_condition_flag.
>       * ipa-split.cc (pass_split_functions::gate): Likewise.
>       * passes.cc (finish_optimization_passes): Likewise.
>       * profile.cc (find_conditions): New declaration.
>       (cov_length): Likewise.
>       (cov_blocks): Likewise.
>       (cov_masks): Likewise.
>       (cov_free): Likewise.
>       (instrument_decisions): New.
>       (read_thunk_profile): Control output to file.
>       (branch_prob): Call find_conditions, instrument_decisions.
>       (init_branch_prob): Add total_num_conds.
>       (end_branch_prob): Likewise.
>       * tree-profile.cc (struct conds_ctx): New.
>       (CONDITIONS_MAX_TERMS): New.
>       (EDGE_CONDITION): New.
>       (cmp_index_map): New.
>       (index_of): New.
>       (block_conditional_p): New.
>       (edge_conditional_p): New.
>       (single): New.
>       (single_edge): New.
>       (contract_edge): New.
>       (contract_edge_up): New.
>       (ancestors_of): New.
>       (struct outcomes): New.
>       (conditional_succs): New.
>       (condition_index): New.
>       (masking_vectors): New.
>       (cond_reachable_from): New.
>       (neighborhood): New.
>       (isolate_expression): New.
>       (emit_bitwise_op): New.
>       (make_index_map_visit): New.
>       (make_index_map): New.
>       (collect_conditions): New.
>       (yes): New.
>       (struct condcov): New.
>       (cov_length): New.
>       (cov_blocks): New.
>       (cov_masks): New.
>       (cov_free): New.
>       (find_conditions): New.
>       (instrument_decisions): New.
>       (tree_profiling): Check profile_condition_flag.
>       (pass_ipa_tree_profile::gate): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>       * lib/gcov.exp: Add condition coverage test function.
>       * g++.dg/gcov/gcov-18.C: New test.
>       * gcc.misc-tests/gcov-19.c: New test.
>       * gcc.misc-tests/gcov-20.c: New test.
>       * gcc.misc-tests/gcov-21.c: New test.
> ---
>  gcc/builtins.cc                        |    2 +-
>  gcc/collect2.cc                        |    7 +-
>  gcc/common.opt                         |    8 +
>  gcc/doc/gcov.texi                      |   37 +
>  gcc/doc/invoke.texi                    |   19 +
>  gcc/gcc.cc                             |    4 +-
>  gcc/gcov-counter.def                   |    3 +
>  gcc/gcov-dump.cc                       |   24 +
>  gcc/gcov-io.h                          |    3 +
>  gcc/gcov.cc                            |  197 +++-
>  gcc/ipa-inline.cc                      |    2 +-
>  gcc/ipa-split.cc                       |    3 +-
>  gcc/passes.cc                          |    3 +-
>  gcc/profile.cc                         |   84 +-
>  gcc/testsuite/g++.dg/gcov/gcov-18.C    |  234 +++++
>  gcc/testsuite/gcc.misc-tests/gcov-19.c | 1249 ++++++++++++++++++++++++
>  gcc/testsuite/gcc.misc-tests/gcov-20.c |   22 +
>  gcc/testsuite/gcc.misc-tests/gcov-21.c |   16 +
>  gcc/testsuite/lib/gcov.exp             |  191 +++-
>  gcc/tree-profile.cc                    | 1048 +++++++++++++++++++-
>  libgcc/libgcov-merge.c                 |    5 +
>  21 files changed, 3135 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-18.C
>  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-19.c
>  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-20.c
>  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-21.c
> 
> @@ -392,6 +394,28 @@ tag_arcs (const char *filename ATTRIBUTE_UNUSED,
>      }
>  }
>  
> +static void
> +tag_conditions (const char *filename ATTRIBUTE_UNUSED,
> +             unsigned tag ATTRIBUTE_UNUSED, int length ATTRIBUTE_UNUSED,
> +             unsigned depth)
There should be block comments before functions
> @@ -134,6 +135,28 @@ public:
>    vector<unsigned> lines;
>  };
>  
> +class condition_info
... and datastructures.
> +/* Output conditions (modified condition/decision coverage) */
> +
> +static int flag_conditions = 0;

This is really a bool.  Gcov was not updated for a while, I think
you can simply convert other flag_* to bools as well.
> diff --git a/gcc/ipa-split.cc b/gcc/ipa-split.cc
> index 6730f4f9d0e..276ead617c9 100644
> --- a/gcc/ipa-split.cc
> +++ b/gcc/ipa-split.cc
> @@ -1930,7 +1930,8 @@ pass_split_functions::gate (function *)
>    /* When doing profile feedback, we want to execute the pass after profiling
>       is read.  So disable one in early optimization.  */
>    return (flag_partial_inlining
> -       && !profile_arc_flag && !flag_branch_probabilities);
> +      && !profile_arc_flag && !flag_branch_probabilities
> +      && !profile_condition_flag);

This should be unnecessary - profile_condition is not used for profile
feedback.
>  }
>  
>  } // anon namespace
> diff --git a/gcc/passes.cc b/gcc/passes.cc
> index 6f894a41d22..02194fe286f 100644
> --- a/gcc/passes.cc
> +++ b/gcc/passes.cc
> @@ -352,7 +352,8 @@ finish_optimization_passes (void)
>    gcc::dump_manager *dumps = m_ctxt->get_dumps ();
>  
>    timevar_push (TV_DUMP);
> -  if (profile_arc_flag || flag_test_coverage || flag_branch_probabilities)
> +  if (profile_arc_flag || profile_condition_flag || flag_test_coverage
> +      || flag_branch_probabilities)
>      {
>        dumps->dump_start (pass_profile_1->static_pass_number, NULL);
>        end_branch_prob ();

I think this whole code is unnecesar since branch probability is now an
IPA pass and runs on whole program at once, so the stats done by
end_branch_prob can be output from the pass itself.  

(originally the pass was doine as part of optimization queue and we
needed a place to output stats at the end of compilation).
> @@ -1397,10 +1415,18 @@ branch_prob (bool thunk)
>  
>    /* Write the data from which gcov can reconstruct the basic block
>       graph and function line numbers (the gcno file).  */
> +  output_to_file = false;
>    if (coverage_begin_function (lineno_checksum, cfg_checksum))
>      {
>        gcov_position_t offset;
>  
> +      /* The condition coverage needs a deeper analysis to identify 
> expressions
> +       * of conditions, which means it is not yet ready to write to the gcno
> +       * file.  It will write its entries later, but needs to know if it do 
> it
> +       * in the first place, which is controlled by the return value of
> +       * coverage_begin_function.  */

No * on ever line.
> +      output_to_file = true;
> +
>        /* Basic block flags */
>        offset = gcov_write_tag (GCOV_TAG_BLOCKS);
>        gcov_write_unsigned (n_basic_blocks_for_fn (cfun));
> +      if (coverage_counter_alloc (GCOV_COUNTER_CONDS, 2 * nconds))
> +     {
> +       /* Add two extra variables to the function for the local
> +          accumulators, which are zero'd on the entry of a new conditional.
> +          The local accumulators are shared between decisions in order to
> +          use less stack space.  */
Shouldn't we able to allocate the stack space.
> +       tree accu[2] = {
> +         build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                     get_identifier ("__accu_t"), get_gcov_type ()),
> +         build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                     get_identifier ("__accu_f"), get_gcov_type ()),

Can't those be just SSA names?
> +       };
> +
> +       gcov_position_t offset {};
> +       if (output_to_file)
> +           offset = gcov_write_tag (GCOV_TAG_CONDS);
> +
> +       for (unsigned i = 0; i < nconds; ++i)
> +         {
> +           array_slice<basic_block> expr = cov_blocks (cov, i);
> +           array_slice<gcov_type_unsigned> masks = cov_masks (cov, i);
> +           gcc_assert (expr.is_valid ());
> +           gcc_assert (masks.is_valid ());
> +
> +           int terms = instrument_decisions (expr, i, accu, masks.begin ());
> +           if (output_to_file)
> +             {
> +               gcov_write_unsigned (expr.front ()->index);
> +               gcov_write_unsigned (terms);
> +             }
> +         }
> +       if (output_to_file)
> +           gcov_write_length (offset);
> +     }
> +      cov_free (cov);
> +    }
> +
>    /* For each edge not on the spanning tree, add counting code.  */
>    if (profile_arc_flag
>        && coverage_counter_alloc (GCOV_COUNTER_ARCS, num_instrumented))
>      {
>        unsigned n_instrumented;
>  
> -      gimple_init_gcov_profiler ();
> -
>        n_instrumented = instrument_edges (el);
>  
>        gcc_assert (n_instrumented == num_instrumented);
>  
>        if (flag_profile_values)
>       instrument_values (values);
> -
> -      /* Commit changes done by instrumentation.  */
> -      gsi_commit_edge_inserts ();
>      }
>  
>    free_aux_for_edges ();
>  
>    values.release ();
>    free_edge_list (el);
> +  /* Commit changes done by instrumentation.  */
> +  gsi_commit_edge_inserts ();
> +
>    coverage_end_function (lineno_checksum, cfg_checksum);
>    if (flag_branch_probabilities
>        && (profile_status_for_fn (cfun) == PROFILE_READ))
> @@ -1669,6 +1740,7 @@ init_branch_prob (void)
>    total_num_passes = 0;
>    total_num_times_called = 0;
>    total_num_branches = 0;
> +  total_num_conds = 0;
>    for (i = 0; i < 20; i++)
>      total_hist_br_prob[i] = 0;
>  }
> @@ -1708,5 +1780,7 @@ end_branch_prob (void)
>                    (total_hist_br_prob[i] + total_hist_br_prob[19-i]) * 100
>                    / total_num_branches, 5*i, 5*i+5);
>       }
> +      fprintf (dump_file, "Total number of conditions: %d\n",
> +            total_num_conds);
>      }
>  }

> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index da300d5f9e8..c8b917afb9a 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "alloc-pool.h"
>  #include "symbol-summary.h"
>  #include "symtab-thunks.h"
> +#include "cfganal.h"
> +#include "cfgloop.h"
>  
>  static GTY(()) tree gcov_type_node;
>  static GTY(()) tree tree_interval_profiler_fn;
> @@ -73,6 +75,1046 @@ static GTY(()) tree ic_tuple_var;
>  static GTY(()) tree ic_tuple_counters_field;
>  static GTY(()) tree ic_tuple_callee_field;
>  
> +namespace
> +{

Maybe some overall comment what this code is doing (which you describe
in tthe email).  Also it may make sense to place it to a new source file: it is 
long enough.
> +/* Some context and reused instances between function calls.  Large embedded
> +   buffers are used to up-front request enough memory for most programs and
> +   merge them into a single allocation at the cost of using more memory in 
> the
> +   average case.  Some numbers from linux v5.13 which is assumed to be a
> +   reasonably diverse code base: 75% of the functions in linux have less than
> +   16 nodes in the CFG and approx 2.5% have more than 64 nodes.  The 
> functions
> +   that go beyond a few dozen nodes tend to be very large (>100) and so 64
> +   seems like a good balance.
> +
> +   This is really just a performance balance of the cost of allocation and
> +   wasted memory.  */
> +struct conds_ctx
> +{
> +    /* Bitmap of the processed blocks.  Bit n set means basic_block->index 
> has
> +       been processed either explicitly or as a part of an expression.  */
> +    auto_sbitmap marks;
> +
> +    /* This is both a reusable shared allocation which is also used to return
> +       single expressions, which means it for most code should only hold a
> +       couple of elements.  */
> +    auto_vec<basic_block, 32> blocks;
> +
> +    /* Map from basic_block->index to an ordering so that for a single
> +       expression (a || b && c) => index_map[a] < index_map[b] < 
> index_map[c].
> +       The values do not have to be consecutive and can be interleaved by
> +       values from other expressions, so comparisons only make sense for 
> blocks
> +       that belong to the same expression.  */
> +    auto_vec<int, 64> index_map;
> +
> +    /* Pre-allocate bitmaps and vectors for per-function book keeping.  This 
> is
> +       pure instance reuse and the bitmaps carry no data between function
> +       calls.  */
> +    auto_vec<basic_block, 64> B1;
> +    auto_vec<basic_block, 64> B2;
> +    auto_sbitmap G1;
> +    auto_sbitmap G2;
> +    auto_sbitmap G3;
> +
> +    explicit conds_ctx (unsigned size) noexcept (true) : marks (size),
> +    G1 (size), G2 (size), G3 (size)
> +    {
> +     bitmap_clear (marks);
> +    }
> +
> +    /* Mark a node as processed so nodes are not processed twice for example 
> in
> +       loops, gotos.  */
> +    void mark (const basic_block b) noexcept (true)
> +    {
> +     gcc_assert (!bitmap_bit_p (marks, b->index));
> +     bitmap_set_bit (marks, b->index);
> +    }
> +
> +    /* Mark nodes as processed so they are not processed twice.  */
> +    void mark (const vec<basic_block>& bs) noexcept (true)
> +    {
> +     for (const basic_block b : bs)
> +         mark (b);
> +    }
> +
> +    /* Check if all nodes are marked.  A successful run should visit & mark
> +       every reachable node exactly once.  */
> +    bool all_marked (const vec<basic_block>& reachable) const noexcept (true)
> +    {
> +     for (const basic_block b : reachable)
> +         if (!bitmap_bit_p (marks, b->index))
> +             return false;
> +     return true;
> +    }
> +};
> +
> +/* Only instrument terms with fewer than number of bits in a (wide) gcov
> +   integer, which is probably 64.  The algorithm itself does not impose this
> +   limitation, but it makes for a simpler implementation.
> +
> +   * Allocating the output data structure (coverage_counter_alloc ()) can
> +     assume pairs of gcov_type_unsigned and not use a separate length field.
> +   * A pair gcov_type_unsigned can be used as accumulators.
> +   * Updating accumulators is can use the bitwise operations |=, &= and not
> +     custom operators that work for arbitrary-sized bit-sets.
> +
> +   Most real-world code should be unaffected by this, but it is possible
> +   (especially for generated code) to exceed this limit.  */
> +#define CONDITIONS_MAX_TERMS (sizeof (gcov_type_unsigned) * BITS_PER_UNIT)

You also need to consider target number of bits in the gcov type:
targets can change the 64bit default to something else.
There is gcov_type_node from which you can get number of bits.
> +
> +/* Special cases of the single_*_p and single_*_edge functions in 
> basic-block.h
> +   that don't consider exception handling or other complex edges.  This helps
> +   create a view of the CFG with only normal edges - if a basic block has 
> both
> +   an outgoing fallthrough and exceptional edge [1], it should be considered 
> a
> +   single-successor.
> +
> +   [1] if this is not possible, these functions can be removed and replaced 
> by
> +       their basic-block.h cousins.  */

This should be achievable with -fnon-call-exceptions.  For example
division can throw an exception then.
> +bool
> +single (const vec<edge, va_gc> *edges)
Maybe single_p to keep naming similar to basic-block?
> +{
> +    int n = EDGE_COUNT (edges);
> +    if (n == 0)
> +     return false;
> +
> +    for (edge e : edges)
> +     if (e->flags & EDGE_COMPLEX)
> +         n -= 1;
> +
> +    return n == 1;
> +}
> +
> +/* Get the single, non-complex edge.  Behavior is undefined edges have more
> +   than 1 non-complex edges.  */
> +edge
> +single_edge (const vec<edge, va_gc> *edges)
> +{
Perhaps to keep behaviour consistent with basic-block.h you want to add
   gcc_checking_assert (single_p (edges));
> +/* Find the set {ancestors (p) intersect G} where ancestors is the recursive
> +   set of predecessors for p.  Limiting to the ancestors that are also in G
> +   (see cond_reachable_from) and by q is an optimization as ancestors 
> outside G
> +   have no effect when isolating expressions.
> +
> +   dfs_enumerate_from () does not work as the filter function needs edge
> +   information and dfs_enumerate_from () only considers blocks.  */
> +void
> +ancestors_of (basic_block p, basic_block q, const sbitmap G, sbitmap 
> ancestors)

coding style does not really like using uppercase letter for variables.
I see it was a set in the paper, but one can give it bit more
descriptive name in the source code.
> +{
> +    if (!bitmap_bit_p (G, p->index))
> +     return;
> +
> +    bitmap_set_bit (ancestors, p->index);
> +    bitmap_set_bit (ancestors, q->index);
> +    if (p == q)
> +     return;
> +
> +    auto_vec<basic_block, 16> stack;
> +    stack.safe_push (p);
> +
> +    while (!stack.is_empty ())
> +    {
> +     basic_block b = stack.pop ();
> +     if (single (b->preds))
> +     {
> +         edge e = single_edge (b->preds);
> +         e = contract_edge_up (e);
> +         b = e->dest;
> +     }
So this walks chain of single pred edges until something non-single is
found and only those are added ancessor bitmaps?
Why single_edge basic blocks are not inserted into the ancestor birmap?
> +
> +     for (edge e : b->preds)
> +     {
> +         basic_block src = e->src;
> +         if (bitmap_bit_p (ancestors, e->src->index))
> +             continue;
> +         if (!bitmap_bit_p (G, e->src->index))
> +             continue;
> +         bitmap_set_bit (ancestors, src->index);
bitmap_set_bit returns boolean value whether the bit was newly set
or previously 1.  So you can avoid bitmap_bit_p above.
> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
> index 5d6e17d1483..eed3556373b 100644
> --- a/libgcc/libgcov-merge.c
> +++ b/libgcc/libgcov-merge.c
> @@ -33,6 +33,11 @@ void __gcov_merge_add (gcov_type *counters  __attribute__ 
> ((unused)),
>                         unsigned n_counters __attribute__ ((unused))) {}
>  #endif
>  
> +#ifdef L_gcov_merge_ior
> +void __gcov_merge_ior (gcov_type *counters  __attribute__ ((unused)),
> +                    unsigned n_counters __attribute__ ((unused))) {}
> +#endif

Why this is necessary?

The profiling part looks good to me.  I am worried about the pattern
matching of gimple to recognize conditionals especially after the early
optimizations was run.  Does it work reasonably with -O2?
Perhaps it would make sense to schedule this kind of profiling early
shortly after build_ssa which would make it incompatible with profile
feedback, but I think that wouold be OK.

Richi, can you please look at the gimple matching part?
Honza

Re: [PATCH 01/22] Add condition coverage profiling

Reply via email to