On Sat, 21 Oct 2023, J?rgen Kvalsvik wrote: > On 05/10/2023 22:39, J?rgen Kvalsvik wrote: > > On 05/10/2023 21:59, Jan Hubicka wrote: > >>> > >>> Like Wahlen et al this implementation records coverage in fixed-size > >>> bitsets which gcov knows how to interpret. This is very fast, but > >>> introduces a limit on the number of terms in a single boolean > >>> expression, the number of bits in a gcov_unsigned_type (which is > >>> typedef'd to uint64_t), so for most practical purposes this would be > >>> acceptable. This limitation is in the implementation and not the > >>> algorithm, so support for more conditions can be added by also > >>> introducing arbitrary-sized bitsets. > >> > >> This should not be too hard to do - if conditionalis more complex you > >> simply introduce more than one counter for it, right? > >> How many times this trigers on GCC sources? > > > > It shouldn't be, no. But when dynamic bitsets are on the table it would be > > much better to length-encode in smaller multiples than the 64-bit counters. > > Most expressions are small (<4 terms), so the savings would be substantial. > > I opted for the simpler fixed-size to start with because it is much simpler > > and would not introduce any branching or decisions in the instrumentation. > > I just posted v6 of this patch, and the bitsets are still fixed size. I > consider dynamic bitsets out of scope for this particular effort, although I > think it might be worth pursuing later. > > > > >>> > >>> For space overhead, the instrumentation needs two accumulators > >>> (gcov_unsigned_type) per condition in the program which will be written > >>> to the gcov file. In addition, every function gets a pair of local > >>> accumulators, but these accmulators are reused between conditions in the > >>> same function. > >>> > >>> For time overhead, there is a zeroing of the local accumulators for > >>> every condition and one or two bitwise operation on every edge taken in > >>> the an expression. > >>> > >>> In action it looks pretty similar to the branch coverage. The -g short > >>> opt carries no significance, but was chosen because it was an available > >>> option with the upper-case free too. > >>> > >>> gcov --conditions: > >>> > >>> 3: 17:void fn (int a, int b, int c, int d) { > >>> 3: 18: if ((a && (b || c)) && d) > >>> conditions covered 3/8 > >>> condition 0 not covered (true) > >>> condition 0 not covered (false) > >>> condition 1 not covered (true) > >>> condition 2 not covered (true) > >>> condition 3 not covered (true) > >> It seems understandable, but for bigger conditionals I guess it will be > >> bit hard to make sense between condition numbers and the actual source > >> code. We could probably also show the conditions as ranges in the > >> conditional? I am adding David Malcolm to CC, he may have some ideas. > >> > >> I wonder how much this information is confused by early optimizations > >> happening before coverage profiling? > > Yes, but I could not figure out strong gcov mechanisms to make a truly better > one, and the json + extra tooling is more in line with what you want for large > conditionals, I think. I also specifically did one unit of information per > line to play nicer with grep, so it currently looks like: > > conditions covered 3/8 > condition 0 not covered (true false) > ... > > I think improving gcov's printing abilities would do wonders, but I couldn't > find anything that would support it currently. Did I miss something? > > >>> > >>> Some expressions, mostly those without else-blocks, are effectively > >>> "rewritten" in the CFG construction making the algorithm unable to > >>> distinguish them: > >>> > >>> and.c: > >>> > >>> if (a && b && c) > >>> x = 1; > >>> > >>> ifs.c: > >>> > >>> if (a) > >>> if (b) > >>> if (c) > >>> x = 1; > >>> > >>> gcc will build the same graph for both these programs, and gcov will > >>> report boths as 3-term expressions. It is vital that it is not > >>> interpreted the other way around (which is consistent with the shape of > >>> the graph) because otherwise the masking would be wrong for the and.c > >>> program which is a more severe error. While surprising, users would > >>> probably expect some minor rewriting of semantically-identical > >>> expressions. > >>> > >>> and.c.gcov: > >>> #####: 2: if (a && b && c) > >>> conditions covered 6/6 > >>> #####: 3: x = 1; > >>> > >>> ifs.c.gcov: > >>> #####: 2: if (a) > >>> #####: 3: if (b) > >>> #####: 4: if (c) > >>> #####: 5: x = 1; > >>> conditions covered 6/6 > >> > >> Maybe one can use location information to distinguish those cases? > >> Don't we store discriminator info about individual statements that is also > >> used for > >> auto-FDO? > > > > That is one possibility, which I tried for a bit, but abandoned to focus on > > getting the rest of the algorithm right. I am sure it can be revisited > > (possibly as a future improvement) and weighted against always emitting an > > else block (see > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631254.html) > > This has not been changed in v6. I think the always-else approach is nicer. > > > > >>> > >>> gcc/ChangeLog: > >>> > >>> * builtins.cc (expand_builtin_fork_or_exec): Check > >>> profile_condition_flag. > >>> * collect2.cc (main): Add -fno-profile-conditions to OBSTACK. > >>> * common.opt: Add new options -fprofile-conditions and > >>> * doc/gcov.texi: Add --conditions documentation. > >>> * doc/invoke.texi: Add -fprofile-conditions documentation. > >>> * gcc.cc: Link gcov on -fprofile-conditions. > >>> * gcov-counter.def (GCOV_COUNTER_CONDS): New. > >>> * gcov-dump.cc (tag_conditions): New. > >>> * gcov-io.h (GCOV_TAG_CONDS): New. > >>> (GCOV_TAG_CONDS_LENGTH): Likewise. > >>> (GCOV_TAG_CONDS_NUM): Likewise. > >>> * gcov.cc (class condition_info): New. > >>> (condition_info::condition_info): New. > >>> (condition_info::popcount): New. > >>> (struct coverage_info): New. > >>> (add_condition_counts): New. > >>> (output_conditions): New. > >>> (print_usage): Add -g, --conditions. > >>> (process_args): Likewise. > >>> (output_intermediate_json_line): Output conditions. > >>> (read_graph_file): Read conditions counters. > >>> (read_count_file): Read conditions counters. > >>> (file_summary): Print conditions. > >>> (accumulate_line_info): Accumulate conditions. > >>> (output_line_details): Print conditions. > >>> * ipa-inline.cc (can_early_inline_edge_p): Check > >>> profile_condition_flag. > >>> * ipa-split.cc (pass_split_functions::gate): Likewise. > >>> * passes.cc (finish_optimization_passes): Likewise. > >>> * profile.cc (find_conditions): New declaration. > >>> (cov_length): Likewise. > >>> (cov_blocks): Likewise. > >>> (cov_masks): Likewise. > >>> (cov_free): Likewise. > >>> (instrument_decisions): New. > >>> (read_thunk_profile): Control output to file. > >>> (branch_prob): Call find_conditions, instrument_decisions. > >>> (init_branch_prob): Add total_num_conds. > >>> (end_branch_prob): Likewise. > >>> * tree-profile.cc (struct conds_ctx): New. > >>> (CONDITIONS_MAX_TERMS): New. > >>> (EDGE_CONDITION): New. > >>> (cmp_index_map): New. > >>> (index_of): New. > >>> (block_conditional_p): New. > >>> (edge_conditional_p): New. > >>> (single): New. > >>> (single_edge): New. > >>> (contract_edge): New. > >>> (contract_edge_up): New. > >>> (ancestors_of): New. > >>> (struct outcomes): New. > >>> (conditional_succs): New. > >>> (condition_index): New. > >>> (masking_vectors): New. > >>> (cond_reachable_from): New. > >>> (neighborhood): New. > >>> (isolate_expression): New. > >>> (emit_bitwise_op): New. > >>> (make_index_map_visit): New. > >>> (make_index_map): New. > >>> (collect_conditions): New. > >>> (yes): New. > >>> (struct condcov): New. > >>> (cov_length): New. > >>> (cov_blocks): New. > >>> (cov_masks): New. > >>> (cov_free): New. > >>> (find_conditions): New. > >>> (instrument_decisions): New. > >>> (tree_profiling): Check profile_condition_flag. > >>> (pass_ipa_tree_profile::gate): Likewise. > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> * lib/gcov.exp: Add condition coverage test function. > >>> * g++.dg/gcov/gcov-18.C: New test. > >>> * gcc.misc-tests/gcov-19.c: New test. > >>> * gcc.misc-tests/gcov-20.c: New test. > >>> * gcc.misc-tests/gcov-21.c: New test. > >>> --- > >>> gcc/builtins.cc | 2 +- > >>> gcc/collect2.cc | 7 +- > >>> gcc/common.opt | 8 + > >>> gcc/doc/gcov.texi | 37 + > >>> gcc/doc/invoke.texi | 19 + > >>> gcc/gcc.cc | 4 +- > >>> gcc/gcov-counter.def | 3 + > >>> gcc/gcov-dump.cc | 24 + > >>> gcc/gcov-io.h | 3 + > >>> gcc/gcov.cc | 197 +++- > >>> gcc/ipa-inline.cc | 2 +- > >>> gcc/ipa-split.cc | 3 +- > >>> gcc/passes.cc | 3 +- > >>> gcc/profile.cc | 84 +- > >>> gcc/testsuite/g++.dg/gcov/gcov-18.C | 234 +++++ > >>> gcc/testsuite/gcc.misc-tests/gcov-19.c | 1249 ++++++++++++++++++++++++ > >>> gcc/testsuite/gcc.misc-tests/gcov-20.c | 22 + > >>> gcc/testsuite/gcc.misc-tests/gcov-21.c | 16 + > >>> gcc/testsuite/lib/gcov.exp | 191 +++- > >>> gcc/tree-profile.cc | 1048 +++++++++++++++++++- > >>> libgcc/libgcov-merge.c | 5 + > >>> 21 files changed, 3135 insertions(+), 26 deletions(-) > >>> create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-18.C > >>> create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-19.c > >>> create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-20.c > >>> create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-21.c > >>> > >>> @@ -392,6 +394,28 @@ tag_arcs (const char *filename ATTRIBUTE_UNUSED, > >>> } > >>> } > >>> +static void > >>> +tag_conditions (const char *filename ATTRIBUTE_UNUSED, > >>> + unsigned tag ATTRIBUTE_UNUSED, int length ATTRIBUTE_UNUSED, > >>> + unsigned depth) > >> There should be block comments before functions > Done. > > >>> @@ -134,6 +135,28 @@ public: > >>> vector<unsigned> lines; > >>> }; > >>> +class condition_info > >> ... and datastructures. > Done. > > >>> +/* Output conditions (modified condition/decision coverage) */ > >>> + > >>> +static int flag_conditions = 0; > >> > >> This is really a bool. Gcov was not updated for a while, I think > >> you can simply convert other flag_* to bools as well. > Done, but only for my flag. I can change the remaining flags in another patch. > > >>> diff --git a/gcc/ipa-split.cc b/gcc/ipa-split.cc > >>> index 6730f4f9d0e..276ead617c9 100644 > >>> --- a/gcc/ipa-split.cc > >>> +++ b/gcc/ipa-split.cc > >>> @@ -1930,7 +1930,8 @@ pass_split_functions::gate (function *) > >>> /* When doing profile feedback, we want to execute the pass after > >>> profiling > >>> is read. So disable one in early optimization. */ > >>> return (flag_partial_inlining > >>> - && !profile_arc_flag && !flag_branch_probabilities); > >>> + && !profile_arc_flag && !flag_branch_probabilities > >>> + && !profile_condition_flag); > >> > >> This should be unnecessary - profile_condition is not used for profile > >> feedback. > > Removed in v6. > > >>> } > >>> } // anon namespace > >>> diff --git a/gcc/passes.cc b/gcc/passes.cc > >>> index 6f894a41d22..02194fe286f 100644 > >>> --- a/gcc/passes.cc > >>> +++ b/gcc/passes.cc > >>> @@ -352,7 +352,8 @@ finish_optimization_passes (void) > >>> gcc::dump_manager *dumps = m_ctxt->get_dumps (); > >>> timevar_push (TV_DUMP); > >>> - if (profile_arc_flag || flag_test_coverage || > >>> flag_branch_probabilities) > >>> + if (profile_arc_flag || profile_condition_flag || flag_test_coverage > >>> + || flag_branch_probabilities) > >>> { > >>> dumps->dump_start (pass_profile_1->static_pass_number, NULL); > >>> end_branch_prob (); > >> > >> I think this whole code is unnecesar since branch probability is now an > >> IPA pass and runs on whole program at once, so the stats done by > >> end_branch_prob can be output from the pass itself. > Done, removed. > > >> > >> (originally the pass was doine as part of optimization queue and we > >> needed a place to output stats at the end of compilation). > >>> @@ -1397,10 +1415,18 @@ branch_prob (bool thunk) > >>> /* Write the data from which gcov can reconstruct the basic block > >>> graph and function line numbers (the gcno file). */ > >>> + output_to_file = false; > >>> if (coverage_begin_function (lineno_checksum, cfg_checksum)) > >>> { > >>> gcov_position_t offset; > >>> + /* The condition coverage needs a deeper analysis to identify > >>> expressions > >>> + * of conditions, which means it is not yet ready to write to the > >>> gcno > >>> + * file. It will write its entries later, but needs to know if it > >>> do it > >>> + * in the first place, which is controlled by the return value of > >>> + * coverage_begin_function. */ > >> > >> No * on ever line. > Done, except I missed a few. I have fixed them locally, my apologies. > > >>> + output_to_file = true; > >>> + > >>> /* Basic block flags */ > >>> offset = gcov_write_tag (GCOV_TAG_BLOCKS); > >>> gcov_write_unsigned (n_basic_blocks_for_fn (cfun)); > >>> + if (coverage_counter_alloc (GCOV_COUNTER_CONDS, 2 * nconds)) > >>> + { > >>> + /* Add two extra variables to the function for the local > >>> + accumulators, which are zero'd on the entry of a new > >>> conditional. > >>> + The local accumulators are shared between decisions in order to > >>> + use less stack space. */ > >> Shouldn't we able to allocate the stack space. > >>> + tree accu[2] = { > >>> + build_decl (UNKNOWN_LOCATION, VAR_DECL, > >>> + get_identifier ("__accu_t"), get_gcov_type ()), > >>> + build_decl (UNKNOWN_LOCATION, VAR_DECL, > >>> + get_identifier ("__accu_f"), get_gcov_type ()), > >> > >> Can't those be just SSA names? > They can, and it was a lot harder than I thought to get it working (largely > due to my unfamiliarity with it). v6 uses SSA form and passes the test suite. > > >>> + }; > >>> + > >>> + gcov_position_t offset {}; > >>> + if (output_to_file) > >>> + offset = gcov_write_tag (GCOV_TAG_CONDS); > >>> + > >>> + for (unsigned i = 0; i < nconds; ++i) > >>> + { > >>> + array_slice<basic_block> expr = cov_blocks (cov, i); > >>> + array_slice<gcov_type_unsigned> masks = cov_masks (cov, i); > >>> + gcc_assert (expr.is_valid ()); > >>> + gcc_assert (masks.is_valid ()); > >>> + > >>> + int terms = instrument_decisions (expr, i, accu, masks.begin > >>> ()); > >>> + if (output_to_file) > >>> + { > >>> + gcov_write_unsigned (expr.front ()->index); > >>> + gcov_write_unsigned (terms); > >>> + } > >>> + } > >>> + if (output_to_file) > >>> + gcov_write_length (offset); > >>> + } > >>> + cov_free (cov); > >>> + } > >>> + > >>> /* For each edge not on the spanning tree, add counting code. */ > >>> if (profile_arc_flag > >>> && coverage_counter_alloc (GCOV_COUNTER_ARCS, num_instrumented)) > >>> { > >>> unsigned n_instrumented; > >>> - gimple_init_gcov_profiler (); > >>> - > >>> n_instrumented = instrument_edges (el); > >>> gcc_assert (n_instrumented == num_instrumented); > >>> if (flag_profile_values) > >>> instrument_values (values); > >>> - > >>> - /* Commit changes done by instrumentation. */ > >>> - gsi_commit_edge_inserts (); > >>> } > >>> free_aux_for_edges (); > >>> values.release (); > >>> free_edge_list (el); > >>> + /* Commit changes done by instrumentation. */ > >>> + gsi_commit_edge_inserts (); > >>> + > >>> coverage_end_function (lineno_checksum, cfg_checksum); > >>> if (flag_branch_probabilities > >>> && (profile_status_for_fn (cfun) == PROFILE_READ)) > >>> @@ -1669,6 +1740,7 @@ init_branch_prob (void) > >>> total_num_passes = 0; > >>> total_num_times_called = 0; > >>> total_num_branches = 0; > >>> + total_num_conds = 0; > >>> for (i = 0; i < 20; i++) > >>> total_hist_br_prob[i] = 0; > >>> } > >>> @@ -1708,5 +1780,7 @@ end_branch_prob (void) > >>> (total_hist_br_prob[i] + total_hist_br_prob[19-i]) * 100 > >>> / total_num_branches, 5*i, 5*i+5); > >>> } > >>> + fprintf (dump_file, "Total number of conditions: %d\n", > >>> + total_num_conds); > >>> } > >>> } > >> > >>> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc > >>> index da300d5f9e8..c8b917afb9a 100644 > >>> --- a/gcc/tree-profile.cc > >>> +++ b/gcc/tree-profile.cc > >>> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3. If not see > >>> #include "alloc-pool.h" > >>> #include "symbol-summary.h" > >>> #include "symtab-thunks.h" > >>> +#include "cfganal.h" > >>> +#include "cfgloop.h" > >>> static GTY(()) tree gcov_type_node; > >>> static GTY(()) tree tree_interval_profiler_fn; > >>> @@ -73,6 +75,1046 @@ static GTY(()) tree ic_tuple_var; > >>> static GTY(()) tree ic_tuple_counters_field; > >>> static GTY(()) tree ic_tuple_callee_field; > >>> +namespace > >>> +{ > >> > >> Maybe some overall comment what this code is doing (which you describe > >> in tthe email). Also it may make sense to place it to a new source file: > >> it is long enough. > I added a comment, but did not move it out of tree-profiling.cc. As you say, > it is large enough, and I will do it if you want me to, but I still find it > manageable. > > >>> +/* Some context and reused instances between function calls. Large > >>> embedded > >>> + buffers are used to up-front request enough memory for most programs > >>> and > >>> + merge them into a single allocation at the cost of using more memory > >>> in the > >>> + average case. Some numbers from linux v5.13 which is assumed to be a > >>> + reasonably diverse code base: 75% of the functions in linux have less > >>> than > >>> + 16 nodes in the CFG and approx 2.5% have more than 64 nodes. The > >>> functions > >>> + that go beyond a few dozen nodes tend to be very large (>100) and so > >>> 64 > >>> + seems like a good balance. > >>> + > >>> + This is really just a performance balance of the cost of allocation > >>> and > >>> + wasted memory. */ > >>> +struct conds_ctx > >>> +{ > >>> + /* Bitmap of the processed blocks. Bit n set means > >>> basic_block->index has > >>> + been processed either explicitly or as a part of an expression. > >>> */ > >>> + auto_sbitmap marks; > >>> + > >>> + /* This is both a reusable shared allocation which is also used to > >>> return > >>> + single expressions, which means it for most code should only hold > >>> a > >>> + couple of elements. */ > >>> + auto_vec<basic_block, 32> blocks; > >>> + > >>> + /* Map from basic_block->index to an ordering so that for a single > >>> + expression (a || b && c) => index_map[a] < index_map[b] < > >>> index_map[c]. > >>> + The values do not have to be consecutive and can be interleaved by > >>> + values from other expressions, so comparisons only make sense for > >>> blocks > >>> + that belong to the same expression. */ > >>> + auto_vec<int, 64> index_map; > >>> + > >>> + /* Pre-allocate bitmaps and vectors for per-function book keeping. > >>> This is > >>> + pure instance reuse and the bitmaps carry no data between function > >>> + calls. */ > >>> + auto_vec<basic_block, 64> B1; > >>> + auto_vec<basic_block, 64> B2; > >>> + auto_sbitmap G1; > >>> + auto_sbitmap G2; > >>> + auto_sbitmap G3; > >>> + > >>> + explicit conds_ctx (unsigned size) noexcept (true) : marks (size), > >>> + G1 (size), G2 (size), G3 (size) > >>> + { > >>> + bitmap_clear (marks); > >>> + } > >>> + > >>> + /* Mark a node as processed so nodes are not processed twice for > >>> example in > >>> + loops, gotos. */ > >>> + void mark (const basic_block b) noexcept (true) > >>> + { > >>> + gcc_assert (!bitmap_bit_p (marks, b->index)); > >>> + bitmap_set_bit (marks, b->index); > >>> + } > >>> + > >>> + /* Mark nodes as processed so they are not processed twice. */ > >>> + void mark (const vec<basic_block>& bs) noexcept (true) > >>> + { > >>> + for (const basic_block b : bs) > >>> + mark (b); > >>> + } > >>> + > >>> + /* Check if all nodes are marked. A successful run should visit > >>> & mark > >>> + every reachable node exactly once. */ > >>> + bool all_marked (const vec<basic_block>& reachable) const noexcept > >>> (true) > >>> + { > >>> + for (const basic_block b : reachable) > >>> + if (!bitmap_bit_p (marks, b->index)) > >>> + return false; > >>> + return true; > >>> + } > >>> +}; > >>> + > >>> +/* Only instrument terms with fewer than number of bits in a (wide) gcov > >>> + integer, which is probably 64. The algorithm itself does not impose > >>> this > >>> + limitation, but it makes for a simpler implementation. > >>> + > >>> + * Allocating the output data structure (coverage_counter_alloc ()) can > >>> + assume pairs of gcov_type_unsigned and not use a separate length > >>> field. > >>> + * A pair gcov_type_unsigned can be used as accumulators. > >>> + * Updating accumulators is can use the bitwise operations |=, &= and > >>> not > >>> + custom operators that work for arbitrary-sized bit-sets. > >>> + > >>> + Most real-world code should be unaffected by this, but it is possible > >>> + (especially for generated code) to exceed this limit. */ > >>> +#define CONDITIONS_MAX_TERMS (sizeof (gcov_type_unsigned) * > >>> BITS_PER_UNIT) > >> > >> You also need to consider target number of bits in the gcov type: > >> targets can change the 64bit default to something else. > >> There is gcov_type_node from which you can get number of bits. > Ah, yes, that was bad. So, what I have done is change the compiler storage > type for bitsets to uint64_t, and it seems to be reasonable for a while that > the target gcov_type_unsigned is <= 64 bits, and moved target check to > TYPE_PRECISION. > > >>> + > >>> +/* Special cases of the single_*_p and single_*_edge functions in > >>> basic-block.h > >>> + that don't consider exception handling or other complex edges. This > >>> helps > >>> + create a view of the CFG with only normal edges - if a basic block has > >>> both > >>> + an outgoing fallthrough and exceptional edge [1], it should be > >>> considered a > >>> + single-successor. > >>> + > >>> + [1] if this is not possible, these functions can be removed and > >>> replaced by > >>> + their basic-block.h cousins. */ > >> > >> This should be achievable with -fnon-call-exceptions. For example > >> division can throw an exception then. > >>> +bool > >>> +single (const vec<edge, va_gc> *edges) > >> Maybe single_p to keep naming similar to basic-block? > Done. > > >>> +{ > >>> + int n = EDGE_COUNT (edges); > >>> + if (n == 0) > >>> + return false; > >>> + > >>> + for (edge e : edges) > >>> + if (e->flags & EDGE_COMPLEX) > >>> + n -= 1; > >>> + > >>> + return n == 1; > >>> +} > >>> + > >>> +/* Get the single, non-complex edge. Behavior is undefined edges have > >>> more > >>> + than 1 non-complex edges. */ > >>> +edge > >>> +single_edge (const vec<edge, va_gc> *edges) > >>> +{ > >> Perhaps to keep behaviour consistent with basic-block.h you want to add > >> gcc_checking_assert (single_p (edges)); > Done. > > >>> +/* Find the set {ancestors (p) intersect G} where ancestors is the > >>> recursive > >>> + set of predecessors for p. Limiting to the ancestors that are also in > >>> G > >>> + (see cond_reachable_from) and by q is an optimization as ancestors > >>> outside G > >>> + have no effect when isolating expressions. > >>> + > >>> + dfs_enumerate_from () does not work as the filter function needs edge > >>> + information and dfs_enumerate_from () only considers blocks. */ > >>> +void > >>> +ancestors_of (basic_block p, basic_block q, const sbitmap G, sbitmap > >>> ancestors) > >> > >> coding style does not really like using uppercase letter for variables. > >> I see it was a set in the paper, but one can give it bit more > >> descriptive name in the source code. > Done. > > >>> +{ > >>> + if (!bitmap_bit_p (G, p->index)) > >>> + return; > >>> + > >>> + bitmap_set_bit (ancestors, p->index); > >>> + bitmap_set_bit (ancestors, q->index); > >>> + if (p == q) > >>> + return; > >>> + > >>> + auto_vec<basic_block, 16> stack; > >>> + stack.safe_push (p); > >>> + > >>> + while (!stack.is_empty ()) > >>> + { > >>> + basic_block b = stack.pop (); > >>> + if (single (b->preds)) > >>> + { > >>> + edge e = single_edge (b->preds); > >>> + e = contract_edge_up (e); > >>> + b = e->dest; > >>> + } > >> So this walks chain of single pred edges until something non-single is > >> found and only those are added ancessor bitmaps? > >> Why single_edge basic blocks are not inserted into the ancestor birmap? > This patch also adds intermediate nodes to the bitmap, which simplified a few > things in isolate_expression, and made the new instrument_decisions quite > nice. > > >>> + > >>> + for (edge e : b->preds) > >>> + { > >>> + basic_block src = e->src; > >>> + if (bitmap_bit_p (ancestors, e->src->index)) > >>> + continue; > >>> + if (!bitmap_bit_p (G, e->src->index)) > >>> + continue; > >>> + bitmap_set_bit (ancestors, src->index); > >> bitmap_set_bit returns boolean value whether the bit was newly set > >> or previously 1. So you can avoid bitmap_bit_p above. > Done. > > >>> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c > >>> index 5d6e17d1483..eed3556373b 100644 > >>> --- a/libgcc/libgcov-merge.c > >>> +++ b/libgcc/libgcov-merge.c > >>> @@ -33,6 +33,11 @@ void __gcov_merge_add (gcov_type *counters > >>> __attribute__ ((unused)), > >>> unsigned n_counters __attribute__ ((unused))) {} > >>> #endif > >>> +#ifdef L_gcov_merge_ior > >>> +void __gcov_merge_ior (gcov_type *counters __attribute__ ((unused)), > >>> + unsigned n_counters __attribute__ ((unused))) {} > >>> +#endif > >> > >> Why this is necessary? > > To build in freestanding environments, proposed here: > https://inbox.sourceware.org/gcc-patches/ad3f098f-9caa-b656-47e8-6cd1b216f...@embedded-brains.de/ > > >> > >> The profiling part looks good to me. I am worried about the pattern > >> matching of gimple to recognize conditionals especially after the early > >> optimizations was run. Does it work reasonably with -O2? > >> Perhaps it would make sense to schedule this kind of profiling early > >> shortly after build_ssa which would make it incompatible with profile > >> feedback, but I think that wouold be OK. > I would assume scheduling earlier would improve the accuracy also under > optimization (which is nice, but usually these measurements require -O0). > > >> > >> Richi, can you please look at the gimple matching part? > >> Honza > > > > When doing the SSA form instrumentation I found a peculiar problem with this > program: > > int > setjmp003 (int a) > { > if (a || setjmp (buf) > a += 12; > > return a; > } > > void > jump () > { > longjmp (buf, 1); > } > > This gives me coverages like 35/4, 36/4. The setjmp creates a central node > with complex incoming edges, and an outgoing complex edge to the start of the > right operand (the longjmp resume). > > Now, the behaviour of this program is undefined, which makes me curious to > some of the mechanics. I tried to poison the counters on the complex edge into > (_ || setjmp) so that whenever the longjmp happen, the mask would be all-zero > and the counter update a no-op. That failed with complex edges not being > splitable. I tried selecting the poisoned value in a phi node and pick it on > the incoming complex edge, but the complex edge never seemed to be picked in > the phi. Is it not possible to use complex edges in phi nodes, or is there > some way I can detect that the longjmp happened so I can poison the counters? > Or is this behaviour ok considering it is already undefined behaviour?
If it's undefined behavior doing anything should be OK. It is possible to have PHI nodes with incoming abnormal edges, but we need to be able to coalesce arguments and result to the same register (out-of-SSA will ICE if that's not possible). Also usual language applies with respect to what state is saved across setjmp/longjmp, POSIX talks about volatile, basically the values need to be in memory since what part of the register set is preserved across setjmp/longjmp is not very well defined - you probably have to look at generated code to see if there's sth obviously wrong. Richard.