> > Like Wahlen et al this implementation records coverage in fixed-size > bitsets which gcov knows how to interpret. This is very fast, but > introduces a limit on the number of terms in a single boolean > expression, the number of bits in a gcov_unsigned_type (which is > typedef'd to uint64_t), so for most practical purposes this would be > acceptable. This limitation is in the implementation and not the > algorithm, so support for more conditions can be added by also > introducing arbitrary-sized bitsets.
This should not be too hard to do - if conditionalis more complex you simply introduce more than one counter for it, right? How many times this trigers on GCC sources? > > For space overhead, the instrumentation needs two accumulators > (gcov_unsigned_type) per condition in the program which will be written > to the gcov file. In addition, every function gets a pair of local > accumulators, but these accmulators are reused between conditions in the > same function. > > For time overhead, there is a zeroing of the local accumulators for > every condition and one or two bitwise operation on every edge taken in > the an expression. > > In action it looks pretty similar to the branch coverage. The -g short > opt carries no significance, but was chosen because it was an available > option with the upper-case free too. > > gcov --conditions: > > 3: 17:void fn (int a, int b, int c, int d) { > 3: 18: if ((a && (b || c)) && d) > conditions covered 3/8 > condition 0 not covered (true) > condition 0 not covered (false) > condition 1 not covered (true) > condition 2 not covered (true) > condition 3 not covered (true) It seems understandable, but for bigger conditionals I guess it will be bit hard to make sense between condition numbers and the actual source code. We could probably also show the conditions as ranges in the conditional? I am adding David Malcolm to CC, he may have some ideas. I wonder how much this information is confused by early optimizations happening before coverage profiling? > > Some expressions, mostly those without else-blocks, are effectively > "rewritten" in the CFG construction making the algorithm unable to > distinguish them: > > and.c: > > if (a && b && c) > x = 1; > > ifs.c: > > if (a) > if (b) > if (c) > x = 1; > > gcc will build the same graph for both these programs, and gcov will > report boths as 3-term expressions. It is vital that it is not > interpreted the other way around (which is consistent with the shape of > the graph) because otherwise the masking would be wrong for the and.c > program which is a more severe error. While surprising, users would > probably expect some minor rewriting of semantically-identical > expressions. > > and.c.gcov: > #####: 2: if (a && b && c) > conditions covered 6/6 > #####: 3: x = 1; > > ifs.c.gcov: > #####: 2: if (a) > #####: 3: if (b) > #####: 4: if (c) > #####: 5: x = 1; > conditions covered 6/6 Maybe one can use location information to distinguish those cases? Don't we store discriminator info about individual statements that is also used for auto-FDO? > > gcc/ChangeLog: > > * builtins.cc (expand_builtin_fork_or_exec): Check > profile_condition_flag. > * collect2.cc (main): Add -fno-profile-conditions to OBSTACK. > * common.opt: Add new options -fprofile-conditions and > * doc/gcov.texi: Add --conditions documentation. > * doc/invoke.texi: Add -fprofile-conditions documentation. > * gcc.cc: Link gcov on -fprofile-conditions. > * gcov-counter.def (GCOV_COUNTER_CONDS): New. > * gcov-dump.cc (tag_conditions): New. > * gcov-io.h (GCOV_TAG_CONDS): New. > (GCOV_TAG_CONDS_LENGTH): Likewise. > (GCOV_TAG_CONDS_NUM): Likewise. > * gcov.cc (class condition_info): New. > (condition_info::condition_info): New. > (condition_info::popcount): New. > (struct coverage_info): New. > (add_condition_counts): New. > (output_conditions): New. > (print_usage): Add -g, --conditions. > (process_args): Likewise. > (output_intermediate_json_line): Output conditions. > (read_graph_file): Read conditions counters. > (read_count_file): Read conditions counters. > (file_summary): Print conditions. > (accumulate_line_info): Accumulate conditions. > (output_line_details): Print conditions. > * ipa-inline.cc (can_early_inline_edge_p): Check > profile_condition_flag. > * ipa-split.cc (pass_split_functions::gate): Likewise. > * passes.cc (finish_optimization_passes): Likewise. > * profile.cc (find_conditions): New declaration. > (cov_length): Likewise. > (cov_blocks): Likewise. > (cov_masks): Likewise. > (cov_free): Likewise. > (instrument_decisions): New. > (read_thunk_profile): Control output to file. > (branch_prob): Call find_conditions, instrument_decisions. > (init_branch_prob): Add total_num_conds. > (end_branch_prob): Likewise. > * tree-profile.cc (struct conds_ctx): New. > (CONDITIONS_MAX_TERMS): New. > (EDGE_CONDITION): New. > (cmp_index_map): New. > (index_of): New. > (block_conditional_p): New. > (edge_conditional_p): New. > (single): New. > (single_edge): New. > (contract_edge): New. > (contract_edge_up): New. > (ancestors_of): New. > (struct outcomes): New. > (conditional_succs): New. > (condition_index): New. > (masking_vectors): New. > (cond_reachable_from): New. > (neighborhood): New. > (isolate_expression): New. > (emit_bitwise_op): New. > (make_index_map_visit): New. > (make_index_map): New. > (collect_conditions): New. > (yes): New. > (struct condcov): New. > (cov_length): New. > (cov_blocks): New. > (cov_masks): New. > (cov_free): New. > (find_conditions): New. > (instrument_decisions): New. > (tree_profiling): Check profile_condition_flag. > (pass_ipa_tree_profile::gate): Likewise. > > gcc/testsuite/ChangeLog: > > * lib/gcov.exp: Add condition coverage test function. > * g++.dg/gcov/gcov-18.C: New test. > * gcc.misc-tests/gcov-19.c: New test. > * gcc.misc-tests/gcov-20.c: New test. > * gcc.misc-tests/gcov-21.c: New test. > --- > gcc/builtins.cc | 2 +- > gcc/collect2.cc | 7 +- > gcc/common.opt | 8 + > gcc/doc/gcov.texi | 37 + > gcc/doc/invoke.texi | 19 + > gcc/gcc.cc | 4 +- > gcc/gcov-counter.def | 3 + > gcc/gcov-dump.cc | 24 + > gcc/gcov-io.h | 3 + > gcc/gcov.cc | 197 +++- > gcc/ipa-inline.cc | 2 +- > gcc/ipa-split.cc | 3 +- > gcc/passes.cc | 3 +- > gcc/profile.cc | 84 +- > gcc/testsuite/g++.dg/gcov/gcov-18.C | 234 +++++ > gcc/testsuite/gcc.misc-tests/gcov-19.c | 1249 ++++++++++++++++++++++++ > gcc/testsuite/gcc.misc-tests/gcov-20.c | 22 + > gcc/testsuite/gcc.misc-tests/gcov-21.c | 16 + > gcc/testsuite/lib/gcov.exp | 191 +++- > gcc/tree-profile.cc | 1048 +++++++++++++++++++- > libgcc/libgcov-merge.c | 5 + > 21 files changed, 3135 insertions(+), 26 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-18.C > create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-19.c > create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-20.c > create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-21.c > > @@ -392,6 +394,28 @@ tag_arcs (const char *filename ATTRIBUTE_UNUSED, > } > } > > +static void > +tag_conditions (const char *filename ATTRIBUTE_UNUSED, > + unsigned tag ATTRIBUTE_UNUSED, int length ATTRIBUTE_UNUSED, > + unsigned depth) There should be block comments before functions > @@ -134,6 +135,28 @@ public: > vector<unsigned> lines; > }; > > +class condition_info ... and datastructures. > +/* Output conditions (modified condition/decision coverage) */ > + > +static int flag_conditions = 0; This is really a bool. Gcov was not updated for a while, I think you can simply convert other flag_* to bools as well. > diff --git a/gcc/ipa-split.cc b/gcc/ipa-split.cc > index 6730f4f9d0e..276ead617c9 100644 > --- a/gcc/ipa-split.cc > +++ b/gcc/ipa-split.cc > @@ -1930,7 +1930,8 @@ pass_split_functions::gate (function *) > /* When doing profile feedback, we want to execute the pass after profiling > is read. So disable one in early optimization. */ > return (flag_partial_inlining > - && !profile_arc_flag && !flag_branch_probabilities); > + && !profile_arc_flag && !flag_branch_probabilities > + && !profile_condition_flag); This should be unnecessary - profile_condition is not used for profile feedback. > } > > } // anon namespace > diff --git a/gcc/passes.cc b/gcc/passes.cc > index 6f894a41d22..02194fe286f 100644 > --- a/gcc/passes.cc > +++ b/gcc/passes.cc > @@ -352,7 +352,8 @@ finish_optimization_passes (void) > gcc::dump_manager *dumps = m_ctxt->get_dumps (); > > timevar_push (TV_DUMP); > - if (profile_arc_flag || flag_test_coverage || flag_branch_probabilities) > + if (profile_arc_flag || profile_condition_flag || flag_test_coverage > + || flag_branch_probabilities) > { > dumps->dump_start (pass_profile_1->static_pass_number, NULL); > end_branch_prob (); I think this whole code is unnecesar since branch probability is now an IPA pass and runs on whole program at once, so the stats done by end_branch_prob can be output from the pass itself. (originally the pass was doine as part of optimization queue and we needed a place to output stats at the end of compilation). > @@ -1397,10 +1415,18 @@ branch_prob (bool thunk) > > /* Write the data from which gcov can reconstruct the basic block > graph and function line numbers (the gcno file). */ > + output_to_file = false; > if (coverage_begin_function (lineno_checksum, cfg_checksum)) > { > gcov_position_t offset; > > + /* The condition coverage needs a deeper analysis to identify > expressions > + * of conditions, which means it is not yet ready to write to the gcno > + * file. It will write its entries later, but needs to know if it do > it > + * in the first place, which is controlled by the return value of > + * coverage_begin_function. */ No * on ever line. > + output_to_file = true; > + > /* Basic block flags */ > offset = gcov_write_tag (GCOV_TAG_BLOCKS); > gcov_write_unsigned (n_basic_blocks_for_fn (cfun)); > + if (coverage_counter_alloc (GCOV_COUNTER_CONDS, 2 * nconds)) > + { > + /* Add two extra variables to the function for the local > + accumulators, which are zero'd on the entry of a new conditional. > + The local accumulators are shared between decisions in order to > + use less stack space. */ Shouldn't we able to allocate the stack space. > + tree accu[2] = { > + build_decl (UNKNOWN_LOCATION, VAR_DECL, > + get_identifier ("__accu_t"), get_gcov_type ()), > + build_decl (UNKNOWN_LOCATION, VAR_DECL, > + get_identifier ("__accu_f"), get_gcov_type ()), Can't those be just SSA names? > + }; > + > + gcov_position_t offset {}; > + if (output_to_file) > + offset = gcov_write_tag (GCOV_TAG_CONDS); > + > + for (unsigned i = 0; i < nconds; ++i) > + { > + array_slice<basic_block> expr = cov_blocks (cov, i); > + array_slice<gcov_type_unsigned> masks = cov_masks (cov, i); > + gcc_assert (expr.is_valid ()); > + gcc_assert (masks.is_valid ()); > + > + int terms = instrument_decisions (expr, i, accu, masks.begin ()); > + if (output_to_file) > + { > + gcov_write_unsigned (expr.front ()->index); > + gcov_write_unsigned (terms); > + } > + } > + if (output_to_file) > + gcov_write_length (offset); > + } > + cov_free (cov); > + } > + > /* For each edge not on the spanning tree, add counting code. */ > if (profile_arc_flag > && coverage_counter_alloc (GCOV_COUNTER_ARCS, num_instrumented)) > { > unsigned n_instrumented; > > - gimple_init_gcov_profiler (); > - > n_instrumented = instrument_edges (el); > > gcc_assert (n_instrumented == num_instrumented); > > if (flag_profile_values) > instrument_values (values); > - > - /* Commit changes done by instrumentation. */ > - gsi_commit_edge_inserts (); > } > > free_aux_for_edges (); > > values.release (); > free_edge_list (el); > + /* Commit changes done by instrumentation. */ > + gsi_commit_edge_inserts (); > + > coverage_end_function (lineno_checksum, cfg_checksum); > if (flag_branch_probabilities > && (profile_status_for_fn (cfun) == PROFILE_READ)) > @@ -1669,6 +1740,7 @@ init_branch_prob (void) > total_num_passes = 0; > total_num_times_called = 0; > total_num_branches = 0; > + total_num_conds = 0; > for (i = 0; i < 20; i++) > total_hist_br_prob[i] = 0; > } > @@ -1708,5 +1780,7 @@ end_branch_prob (void) > (total_hist_br_prob[i] + total_hist_br_prob[19-i]) * 100 > / total_num_branches, 5*i, 5*i+5); > } > + fprintf (dump_file, "Total number of conditions: %d\n", > + total_num_conds); > } > } > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc > index da300d5f9e8..c8b917afb9a 100644 > --- a/gcc/tree-profile.cc > +++ b/gcc/tree-profile.cc > @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3. If not see > #include "alloc-pool.h" > #include "symbol-summary.h" > #include "symtab-thunks.h" > +#include "cfganal.h" > +#include "cfgloop.h" > > static GTY(()) tree gcov_type_node; > static GTY(()) tree tree_interval_profiler_fn; > @@ -73,6 +75,1046 @@ static GTY(()) tree ic_tuple_var; > static GTY(()) tree ic_tuple_counters_field; > static GTY(()) tree ic_tuple_callee_field; > > +namespace > +{ Maybe some overall comment what this code is doing (which you describe in tthe email). Also it may make sense to place it to a new source file: it is long enough. > +/* Some context and reused instances between function calls. Large embedded > + buffers are used to up-front request enough memory for most programs and > + merge them into a single allocation at the cost of using more memory in > the > + average case. Some numbers from linux v5.13 which is assumed to be a > + reasonably diverse code base: 75% of the functions in linux have less than > + 16 nodes in the CFG and approx 2.5% have more than 64 nodes. The > functions > + that go beyond a few dozen nodes tend to be very large (>100) and so 64 > + seems like a good balance. > + > + This is really just a performance balance of the cost of allocation and > + wasted memory. */ > +struct conds_ctx > +{ > + /* Bitmap of the processed blocks. Bit n set means basic_block->index > has > + been processed either explicitly or as a part of an expression. */ > + auto_sbitmap marks; > + > + /* This is both a reusable shared allocation which is also used to return > + single expressions, which means it for most code should only hold a > + couple of elements. */ > + auto_vec<basic_block, 32> blocks; > + > + /* Map from basic_block->index to an ordering so that for a single > + expression (a || b && c) => index_map[a] < index_map[b] < > index_map[c]. > + The values do not have to be consecutive and can be interleaved by > + values from other expressions, so comparisons only make sense for > blocks > + that belong to the same expression. */ > + auto_vec<int, 64> index_map; > + > + /* Pre-allocate bitmaps and vectors for per-function book keeping. This > is > + pure instance reuse and the bitmaps carry no data between function > + calls. */ > + auto_vec<basic_block, 64> B1; > + auto_vec<basic_block, 64> B2; > + auto_sbitmap G1; > + auto_sbitmap G2; > + auto_sbitmap G3; > + > + explicit conds_ctx (unsigned size) noexcept (true) : marks (size), > + G1 (size), G2 (size), G3 (size) > + { > + bitmap_clear (marks); > + } > + > + /* Mark a node as processed so nodes are not processed twice for example > in > + loops, gotos. */ > + void mark (const basic_block b) noexcept (true) > + { > + gcc_assert (!bitmap_bit_p (marks, b->index)); > + bitmap_set_bit (marks, b->index); > + } > + > + /* Mark nodes as processed so they are not processed twice. */ > + void mark (const vec<basic_block>& bs) noexcept (true) > + { > + for (const basic_block b : bs) > + mark (b); > + } > + > + /* Check if all nodes are marked. A successful run should visit & mark > + every reachable node exactly once. */ > + bool all_marked (const vec<basic_block>& reachable) const noexcept (true) > + { > + for (const basic_block b : reachable) > + if (!bitmap_bit_p (marks, b->index)) > + return false; > + return true; > + } > +}; > + > +/* Only instrument terms with fewer than number of bits in a (wide) gcov > + integer, which is probably 64. The algorithm itself does not impose this > + limitation, but it makes for a simpler implementation. > + > + * Allocating the output data structure (coverage_counter_alloc ()) can > + assume pairs of gcov_type_unsigned and not use a separate length field. > + * A pair gcov_type_unsigned can be used as accumulators. > + * Updating accumulators is can use the bitwise operations |=, &= and not > + custom operators that work for arbitrary-sized bit-sets. > + > + Most real-world code should be unaffected by this, but it is possible > + (especially for generated code) to exceed this limit. */ > +#define CONDITIONS_MAX_TERMS (sizeof (gcov_type_unsigned) * BITS_PER_UNIT) You also need to consider target number of bits in the gcov type: targets can change the 64bit default to something else. There is gcov_type_node from which you can get number of bits. > + > +/* Special cases of the single_*_p and single_*_edge functions in > basic-block.h > + that don't consider exception handling or other complex edges. This helps > + create a view of the CFG with only normal edges - if a basic block has > both > + an outgoing fallthrough and exceptional edge [1], it should be considered > a > + single-successor. > + > + [1] if this is not possible, these functions can be removed and replaced > by > + their basic-block.h cousins. */ This should be achievable with -fnon-call-exceptions. For example division can throw an exception then. > +bool > +single (const vec<edge, va_gc> *edges) Maybe single_p to keep naming similar to basic-block? > +{ > + int n = EDGE_COUNT (edges); > + if (n == 0) > + return false; > + > + for (edge e : edges) > + if (e->flags & EDGE_COMPLEX) > + n -= 1; > + > + return n == 1; > +} > + > +/* Get the single, non-complex edge. Behavior is undefined edges have more > + than 1 non-complex edges. */ > +edge > +single_edge (const vec<edge, va_gc> *edges) > +{ Perhaps to keep behaviour consistent with basic-block.h you want to add gcc_checking_assert (single_p (edges)); > +/* Find the set {ancestors (p) intersect G} where ancestors is the recursive > + set of predecessors for p. Limiting to the ancestors that are also in G > + (see cond_reachable_from) and by q is an optimization as ancestors > outside G > + have no effect when isolating expressions. > + > + dfs_enumerate_from () does not work as the filter function needs edge > + information and dfs_enumerate_from () only considers blocks. */ > +void > +ancestors_of (basic_block p, basic_block q, const sbitmap G, sbitmap > ancestors) coding style does not really like using uppercase letter for variables. I see it was a set in the paper, but one can give it bit more descriptive name in the source code. > +{ > + if (!bitmap_bit_p (G, p->index)) > + return; > + > + bitmap_set_bit (ancestors, p->index); > + bitmap_set_bit (ancestors, q->index); > + if (p == q) > + return; > + > + auto_vec<basic_block, 16> stack; > + stack.safe_push (p); > + > + while (!stack.is_empty ()) > + { > + basic_block b = stack.pop (); > + if (single (b->preds)) > + { > + edge e = single_edge (b->preds); > + e = contract_edge_up (e); > + b = e->dest; > + } So this walks chain of single pred edges until something non-single is found and only those are added ancessor bitmaps? Why single_edge basic blocks are not inserted into the ancestor birmap? > + > + for (edge e : b->preds) > + { > + basic_block src = e->src; > + if (bitmap_bit_p (ancestors, e->src->index)) > + continue; > + if (!bitmap_bit_p (G, e->src->index)) > + continue; > + bitmap_set_bit (ancestors, src->index); bitmap_set_bit returns boolean value whether the bit was newly set or previously 1. So you can avoid bitmap_bit_p above. > diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c > index 5d6e17d1483..eed3556373b 100644 > --- a/libgcc/libgcov-merge.c > +++ b/libgcc/libgcov-merge.c > @@ -33,6 +33,11 @@ void __gcov_merge_add (gcov_type *counters __attribute__ > ((unused)), > unsigned n_counters __attribute__ ((unused))) {} > #endif > > +#ifdef L_gcov_merge_ior > +void __gcov_merge_ior (gcov_type *counters __attribute__ ((unused)), > + unsigned n_counters __attribute__ ((unused))) {} > +#endif Why this is necessary? The profiling part looks good to me. I am worried about the pattern matching of gimple to recognize conditionals especially after the early optimizations was run. Does it work reasonably with -O2? Perhaps it would make sense to schedule this kind of profiling early shortly after build_ssa which would make it incompatible with profile feedback, but I think that wouold be OK. Richi, can you please look at the gimple matching part? Honza