On 05/10/2023 22:39, Jørgen Kvalsvik wrote:
On 05/10/2023 21:59, Jan Hubicka wrote:

Like Wahlen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This is very fast, but
introduces a limit on the number of terms in a single boolean
expression, the number of bits in a gcov_unsigned_type (which is
typedef'd to uint64_t), so for most practical purposes this would be
acceptable. This limitation is in the implementation and not the
algorithm, so support for more conditions can be added by also
introducing arbitrary-sized bitsets.

This should not be too hard to do - if conditionalis more complex you
simply introduce more than one counter for it, right?
How many times this trigers on GCC sources?

It shouldn't be, no. But when dynamic bitsets are on the table it would be much better to length-encode in smaller multiples than the 64-bit counters. Most expressions are small (<4 terms), so the savings would be substantial. I opted for the simpler fixed-size to start with because it is much simpler and would not introduce any branching or decisions in the instrumentation.

Oh, and I forgot - I have never seen a real world case that have been more than 64 conditions, but I suppose it may happen with generated code.



For space overhead, the instrumentation needs two accumulators
(gcov_unsigned_type) per condition in the program which will be written
to the gcov file. In addition, every function gets a pair of local
accumulators, but these accmulators are reused between conditions in the
same function.

For time overhead, there is a zeroing of the local accumulators for
every condition and one or two bitwise operation on every edge taken in
the an expression.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

         3:   17:void fn (int a, int b, int c, int d) {
         3:   18:    if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true)
condition  0 not covered (false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)
It seems understandable, but for bigger conditionals I guess it will be
bit hard to make sense between condition numbers and the actual source
code.  We could probably also show the conditions as ranges in the
conditional?  I am adding David Malcolm to CC, he may have some ideas.

I wonder how much this information is confused by early optimizations
happening before coverage profiling?

Some expressions, mostly those without else-blocks, are effectively
"rewritten" in the CFG construction making the algorithm unable to
distinguish them:

and.c:

     if (a && b && c)
         x = 1;

ifs.c:

     if (a)
         if (b)
             if (c)
                 x = 1;

gcc will build the same graph for both these programs, and gcov will
report boths as 3-term expressions. It is vital that it is not
interpreted the other way around (which is consistent with the shape of
the graph) because otherwise the masking would be wrong for the and.c
program which is a more severe error. While surprising, users would
probably expect some minor rewriting of semantically-identical
expressions.

and.c.gcov:
     #####:    2:    if (a && b && c)
conditions covered 6/6
     #####:    3:        x = 1;

ifs.c.gcov:
     #####:    2:    if (a)
     #####:    3:        if (b)
     #####:    4:            if (c)
     #####:    5:                x = 1;
conditions covered 6/6

Maybe one can use location information to distinguish those cases?
Don't we store discriminator info about individual statements that is also used for
auto-FDO?

That is one possibility, which I tried for a bit, but abandoned to focus on getting the rest of the algorithm right. I am sure it can be revisited (possibly as a future improvement) and weighted against always emitting an else block (see https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631254.html)


gcc/ChangeLog:

    * builtins.cc (expand_builtin_fork_or_exec): Check
    profile_condition_flag.
         * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
    * common.opt: Add new options -fprofile-conditions and
    * doc/gcov.texi: Add --conditions documentation.
    * doc/invoke.texi: Add -fprofile-conditions documentation.
    * gcc.cc: Link gcov on -fprofile-conditions.
    * gcov-counter.def (GCOV_COUNTER_CONDS): New.
    * gcov-dump.cc (tag_conditions): New.
    * gcov-io.h (GCOV_TAG_CONDS): New.
    (GCOV_TAG_CONDS_LENGTH): Likewise.
    (GCOV_TAG_CONDS_NUM): Likewise.
    * gcov.cc (class condition_info): New.
    (condition_info::condition_info): New.
    (condition_info::popcount): New.
    (struct coverage_info): New.
    (add_condition_counts): New.
    (output_conditions): New.
    (print_usage): Add -g, --conditions.
    (process_args): Likewise.
    (output_intermediate_json_line): Output conditions.
    (read_graph_file): Read conditions counters.
    (read_count_file): Read conditions counters.
    (file_summary): Print conditions.
    (accumulate_line_info): Accumulate conditions.
    (output_line_details): Print conditions.
    * ipa-inline.cc (can_early_inline_edge_p): Check
    profile_condition_flag.
    * ipa-split.cc (pass_split_functions::gate): Likewise.
    * passes.cc (finish_optimization_passes): Likewise.
    * profile.cc (find_conditions): New declaration.
    (cov_length): Likewise.
    (cov_blocks): Likewise.
    (cov_masks): Likewise.
    (cov_free): Likewise.
    (instrument_decisions): New.
    (read_thunk_profile): Control output to file.
    (branch_prob): Call find_conditions, instrument_decisions.
    (init_branch_prob): Add total_num_conds.
    (end_branch_prob): Likewise.
    * tree-profile.cc (struct conds_ctx): New.
    (CONDITIONS_MAX_TERMS): New.
    (EDGE_CONDITION): New.
    (cmp_index_map): New.
    (index_of): New.
    (block_conditional_p): New.
    (edge_conditional_p): New.
    (single): New.
    (single_edge): New.
    (contract_edge): New.
    (contract_edge_up): New.
    (ancestors_of): New.
    (struct outcomes): New.
    (conditional_succs): New.
    (condition_index): New.
    (masking_vectors): New.
    (cond_reachable_from): New.
    (neighborhood): New.
    (isolate_expression): New.
    (emit_bitwise_op): New.
    (make_index_map_visit): New.
    (make_index_map): New.
    (collect_conditions): New.
    (yes): New.
    (struct condcov): New.
    (cov_length): New.
    (cov_blocks): New.
    (cov_masks): New.
    (cov_free): New.
    (find_conditions): New.
    (instrument_decisions): New.
    (tree_profiling): Check profile_condition_flag.
    (pass_ipa_tree_profile::gate): Likewise.

gcc/testsuite/ChangeLog:

    * lib/gcov.exp: Add condition coverage test function.
    * g++.dg/gcov/gcov-18.C: New test.
    * gcc.misc-tests/gcov-19.c: New test.
    * gcc.misc-tests/gcov-20.c: New test.
    * gcc.misc-tests/gcov-21.c: New test.
---
  gcc/builtins.cc                        |    2 +-
  gcc/collect2.cc                        |    7 +-
  gcc/common.opt                         |    8 +
  gcc/doc/gcov.texi                      |   37 +
  gcc/doc/invoke.texi                    |   19 +
  gcc/gcc.cc                             |    4 +-
  gcc/gcov-counter.def                   |    3 +
  gcc/gcov-dump.cc                       |   24 +
  gcc/gcov-io.h                          |    3 +
  gcc/gcov.cc                            |  197 +++-
  gcc/ipa-inline.cc                      |    2 +-
  gcc/ipa-split.cc                       |    3 +-
  gcc/passes.cc                          |    3 +-
  gcc/profile.cc                         |   84 +-
  gcc/testsuite/g++.dg/gcov/gcov-18.C    |  234 +++++
  gcc/testsuite/gcc.misc-tests/gcov-19.c | 1249 ++++++++++++++++++++++++
  gcc/testsuite/gcc.misc-tests/gcov-20.c |   22 +
  gcc/testsuite/gcc.misc-tests/gcov-21.c |   16 +
  gcc/testsuite/lib/gcov.exp             |  191 +++-
  gcc/tree-profile.cc                    | 1048 +++++++++++++++++++-
  libgcc/libgcov-merge.c                 |    5 +
  21 files changed, 3135 insertions(+), 26 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-18.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-19.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-20.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-21.c

@@ -392,6 +394,28 @@ tag_arcs (const char *filename ATTRIBUTE_UNUSED,
      }
  }
+static void
+tag_conditions (const char *filename ATTRIBUTE_UNUSED,
+        unsigned tag ATTRIBUTE_UNUSED, int length ATTRIBUTE_UNUSED,
+        unsigned depth)
There should be block comments before functions
@@ -134,6 +135,28 @@ public:
    vector<unsigned> lines;
  };
+class condition_info
... and datastructures.
+/* Output conditions (modified condition/decision coverage) */
+
+static int flag_conditions = 0;

This is really a bool.  Gcov was not updated for a while, I think
you can simply convert other flag_* to bools as well.
diff --git a/gcc/ipa-split.cc b/gcc/ipa-split.cc
index 6730f4f9d0e..276ead617c9 100644
--- a/gcc/ipa-split.cc
+++ b/gcc/ipa-split.cc
@@ -1930,7 +1930,8 @@ pass_split_functions::gate (function *)
    /* When doing profile feedback, we want to execute the pass after profiling
       is read.  So disable one in early optimization.  */
    return (flag_partial_inlining
-      && !profile_arc_flag && !flag_branch_probabilities);
+      && !profile_arc_flag && !flag_branch_probabilities
+      && !profile_condition_flag);

This should be unnecessary - profile_condition is not used for profile
feedback.
  }
  } // anon namespace
diff --git a/gcc/passes.cc b/gcc/passes.cc
index 6f894a41d22..02194fe286f 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -352,7 +352,8 @@ finish_optimization_passes (void)
    gcc::dump_manager *dumps = m_ctxt->get_dumps ();
    timevar_push (TV_DUMP);
-  if (profile_arc_flag || flag_test_coverage || flag_branch_probabilities)
+  if (profile_arc_flag || profile_condition_flag || flag_test_coverage
+      || flag_branch_probabilities)
      {
        dumps->dump_start (pass_profile_1->static_pass_number, NULL);
        end_branch_prob ();

I think this whole code is unnecesar since branch probability is now an
IPA pass and runs on whole program at once, so the stats done by
end_branch_prob can be output from the pass itself.

(originally the pass was doine as part of optimization queue and we
needed a place to output stats at the end of compilation).
@@ -1397,10 +1415,18 @@ branch_prob (bool thunk)
    /* Write the data from which gcov can reconstruct the basic block
       graph and function line numbers (the gcno file).  */
+  output_to_file = false;
    if (coverage_begin_function (lineno_checksum, cfg_checksum))
      {
        gcov_position_t offset;
+      /* The condition coverage needs a deeper analysis to identify expressions +       * of conditions, which means it is not yet ready to write to the gcno +       * file.  It will write its entries later, but needs to know if it do it
+       * in the first place, which is controlled by the return value of
+       * coverage_begin_function.  */

No * on ever line.
+      output_to_file = true;
+
        /* Basic block flags */
        offset = gcov_write_tag (GCOV_TAG_BLOCKS);
        gcov_write_unsigned (n_basic_blocks_for_fn (cfun));
+      if (coverage_counter_alloc (GCOV_COUNTER_CONDS, 2 * nconds))
+    {
+      /* Add two extra variables to the function for the local
+         accumulators, which are zero'd on the entry of a new conditional. +         The local accumulators are shared between decisions in order to
+         use less stack space.  */
Shouldn't we able to allocate the stack space.
+      tree accu[2] = {
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+            get_identifier ("__accu_t"), get_gcov_type ()),
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+            get_identifier ("__accu_f"), get_gcov_type ()),

Can't those be just SSA names?
+      };
+
+      gcov_position_t offset {};
+      if (output_to_file)
+          offset = gcov_write_tag (GCOV_TAG_CONDS);
+
+      for (unsigned i = 0; i < nconds; ++i)
+        {
+          array_slice<basic_block> expr = cov_blocks (cov, i);
+          array_slice<gcov_type_unsigned> masks = cov_masks (cov, i);
+          gcc_assert (expr.is_valid ());
+          gcc_assert (masks.is_valid ());
+
+          int terms = instrument_decisions (expr, i, accu, masks.begin ());
+          if (output_to_file)
+        {
+          gcov_write_unsigned (expr.front ()->index);
+          gcov_write_unsigned (terms);
+        }
+        }
+      if (output_to_file)
+          gcov_write_length (offset);
+    }
+      cov_free (cov);
+    }
+
    /* For each edge not on the spanning tree, add counting code.  */
    if (profile_arc_flag
        && coverage_counter_alloc (GCOV_COUNTER_ARCS, num_instrumented))
      {
        unsigned n_instrumented;
-      gimple_init_gcov_profiler ();
-
        n_instrumented = instrument_edges (el);
        gcc_assert (n_instrumented == num_instrumented);
        if (flag_profile_values)
      instrument_values (values);
-
-      /* Commit changes done by instrumentation.  */
-      gsi_commit_edge_inserts ();
      }
    free_aux_for_edges ();
    values.release ();
    free_edge_list (el);
+  /* Commit changes done by instrumentation.  */
+  gsi_commit_edge_inserts ();
+
    coverage_end_function (lineno_checksum, cfg_checksum);
    if (flag_branch_probabilities
        && (profile_status_for_fn (cfun) == PROFILE_READ))
@@ -1669,6 +1740,7 @@ init_branch_prob (void)
    total_num_passes = 0;
    total_num_times_called = 0;
    total_num_branches = 0;
+  total_num_conds = 0;
    for (i = 0; i < 20; i++)
      total_hist_br_prob[i] = 0;
  }
@@ -1708,5 +1780,7 @@ end_branch_prob (void)
               (total_hist_br_prob[i] + total_hist_br_prob[19-i]) * 100
               / total_num_branches, 5*i, 5*i+5);
      }
+      fprintf (dump_file, "Total number of conditions: %d\n",
+           total_num_conds);
      }
  }

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index da300d5f9e8..c8b917afb9a 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
  #include "alloc-pool.h"
  #include "symbol-summary.h"
  #include "symtab-thunks.h"
+#include "cfganal.h"
+#include "cfgloop.h"
  static GTY(()) tree gcov_type_node;
  static GTY(()) tree tree_interval_profiler_fn;
@@ -73,6 +75,1046 @@ static GTY(()) tree ic_tuple_var;
  static GTY(()) tree ic_tuple_counters_field;
  static GTY(()) tree ic_tuple_callee_field;
+namespace
+{

Maybe some overall comment what this code is doing (which you describe
in tthe email).  Also it may make sense to place it to a new source file: it is long enough.
+/* Some context and reused instances between function calls.  Large embedded +   buffers are used to up-front request enough memory for most programs and +   merge them into a single allocation at the cost of using more memory in the +   average case.  Some numbers from linux v5.13 which is assumed to be a +   reasonably diverse code base: 75% of the functions in linux have less than +   16 nodes in the CFG and approx 2.5% have more than 64 nodes.  The functions +   that go beyond a few dozen nodes tend to be very large (>100) and so 64
+   seems like a good balance.
+
+   This is really just a performance balance of the cost of allocation and
+   wasted memory.  */
+struct conds_ctx
+{
+    /* Bitmap of the processed blocks.  Bit n set means basic_block->index has +       been processed either explicitly or as a part of an expression.  */
+    auto_sbitmap marks;
+
+    /* This is both a reusable shared allocation which is also used to return +       single expressions, which means it for most code should only hold a
+       couple of elements.  */
+    auto_vec<basic_block, 32> blocks;
+
+    /* Map from basic_block->index to an ordering so that for a single
+       expression (a || b && c) => index_map[a] < index_map[b] < index_map[c]. +       The values do not have to be consecutive and can be interleaved by +       values from other expressions, so comparisons only make sense for blocks
+       that belong to the same expression.  */
+    auto_vec<int, 64> index_map;
+
+    /* Pre-allocate bitmaps and vectors for per-function book keeping.  This is +       pure instance reuse and the bitmaps carry no data between function
+       calls.  */
+    auto_vec<basic_block, 64> B1;
+    auto_vec<basic_block, 64> B2;
+    auto_sbitmap G1;
+    auto_sbitmap G2;
+    auto_sbitmap G3;
+
+    explicit conds_ctx (unsigned size) noexcept (true) : marks (size),
+    G1 (size), G2 (size), G3 (size)
+    {
+    bitmap_clear (marks);
+    }
+
+    /* Mark a node as processed so nodes are not processed twice for example in
+       loops, gotos.  */
+    void mark (const basic_block b) noexcept (true)
+    {
+    gcc_assert (!bitmap_bit_p (marks, b->index));
+    bitmap_set_bit (marks, b->index);
+    }
+
+    /* Mark nodes as processed so they are not processed twice.  */
+    void mark (const vec<basic_block>& bs) noexcept (true)
+    {
+    for (const basic_block b : bs)
+        mark (b);
+    }
+
+    /* Check if all nodes are marked.  A successful run should visit & mark
+       every reachable node exactly once.  */
+    bool all_marked (const vec<basic_block>& reachable) const noexcept (true)
+    {
+    for (const basic_block b : reachable)
+        if (!bitmap_bit_p (marks, b->index))
+        return false;
+    return true;
+    }
+};
+
+/* Only instrument terms with fewer than number of bits in a (wide) gcov +   integer, which is probably 64.  The algorithm itself does not impose this
+   limitation, but it makes for a simpler implementation.
+
+   * Allocating the output data structure (coverage_counter_alloc ()) can +     assume pairs of gcov_type_unsigned and not use a separate length field.
+   * A pair gcov_type_unsigned can be used as accumulators.
+   * Updating accumulators is can use the bitwise operations |=, &= and not
+     custom operators that work for arbitrary-sized bit-sets.
+
+   Most real-world code should be unaffected by this, but it is possible
+   (especially for generated code) to exceed this limit.  */
+#define CONDITIONS_MAX_TERMS (sizeof (gcov_type_unsigned) * BITS_PER_UNIT)

You also need to consider target number of bits in the gcov type:
targets can change the 64bit default to something else.
There is gcov_type_node from which you can get number of bits.
+
+/* Special cases of the single_*_p and single_*_edge functions in basic-block.h +   that don't consider exception handling or other complex edges. This helps +   create a view of the CFG with only normal edges - if a basic block has both +   an outgoing fallthrough and exceptional edge [1], it should be considered a
+   single-successor.
+
+   [1] if this is not possible, these functions can be removed and replaced by
+       their basic-block.h cousins.  */

This should be achievable with -fnon-call-exceptions.  For example
division can throw an exception then.
+bool
+single (const vec<edge, va_gc> *edges)
Maybe single_p to keep naming similar to basic-block?
+{
+    int n = EDGE_COUNT (edges);
+    if (n == 0)
+    return false;
+
+    for (edge e : edges)
+    if (e->flags & EDGE_COMPLEX)
+        n -= 1;
+
+    return n == 1;
+}
+
+/* Get the single, non-complex edge.  Behavior is undefined edges have more
+   than 1 non-complex edges.  */
+edge
+single_edge (const vec<edge, va_gc> *edges)
+{
Perhaps to keep behaviour consistent with basic-block.h you want to add
    gcc_checking_assert (single_p (edges));
+/* Find the set {ancestors (p) intersect G} where ancestors is the recursive +   set of predecessors for p.  Limiting to the ancestors that are also in G +   (see cond_reachable_from) and by q is an optimization as ancestors outside G
+   have no effect when isolating expressions.
+
+   dfs_enumerate_from () does not work as the filter function needs edge
+   information and dfs_enumerate_from () only considers blocks.  */
+void
+ancestors_of (basic_block p, basic_block q, const sbitmap G, sbitmap ancestors)

coding style does not really like using uppercase letter for variables.
I see it was a set in the paper, but one can give it bit more
descriptive name in the source code.
+{
+    if (!bitmap_bit_p (G, p->index))
+    return;
+
+    bitmap_set_bit (ancestors, p->index);
+    bitmap_set_bit (ancestors, q->index);
+    if (p == q)
+    return;
+
+    auto_vec<basic_block, 16> stack;
+    stack.safe_push (p);
+
+    while (!stack.is_empty ())
+    {
+    basic_block b = stack.pop ();
+    if (single (b->preds))
+    {
+        edge e = single_edge (b->preds);
+        e = contract_edge_up (e);
+        b = e->dest;
+    }
So this walks chain of single pred edges until something non-single is
found and only those are added ancessor bitmaps?
Why single_edge basic blocks are not inserted into the ancestor birmap?
+
+    for (edge e : b->preds)
+    {
+        basic_block src = e->src;
+        if (bitmap_bit_p (ancestors, e->src->index))
+        continue;
+        if (!bitmap_bit_p (G, e->src->index))
+        continue;
+        bitmap_set_bit (ancestors, src->index);
bitmap_set_bit returns boolean value whether the bit was newly set
or previously 1.  So you can avoid bitmap_bit_p above.
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 5d6e17d1483..eed3556373b 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -33,6 +33,11 @@ void __gcov_merge_add (gcov_type *counters __attribute__ ((unused)),                          unsigned n_counters __attribute__ ((unused))) {}
  #endif
+#ifdef L_gcov_merge_ior
+void __gcov_merge_ior (gcov_type *counters  __attribute__ ((unused)),
+               unsigned n_counters __attribute__ ((unused))) {}
+#endif

Why this is necessary?

The profiling part looks good to me.  I am worried about the pattern
matching of gimple to recognize conditionals especially after the early
optimizations was run.  Does it work reasonably with -O2?
Perhaps it would make sense to schedule this kind of profiling early
shortly after build_ssa which would make it incompatible with profile
feedback, but I think that wouold be OK.

Richi, can you please look at the gimple matching part?
Honza


Reply via email to