On 1/5/25 20:25, Jan Hubicka wrote:
ALGORITHM

Since the numbers of paths grows so fast, we need a good
algorithm. The naive approach of generating all paths and discarding
redundancies (see reference_prime_paths in the diff) simply doesn't
complete for even pretty simple functions with a few ten thousand
paths (granted, the implementation is also poor, but only serves as a
reference). Fazli & Afsharchi in their paper "Time and Space-Efficient
Compositional Method for Prime and Test Paths Generation" describe a
neat algorithm which drastically improves on for most programs, and
brings complexity down to something managable. This patch implements
that algorithm with a few minor tweaks.

The algorithm first finds the strongly connected components (SCC) of the graph
and creates a new graph where the vertices are the SCCs of the CFG. Within
these vertices different paths are found - regular prime paths, paths that
start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC
paths are combined with paths through the CFG which greatly reduces of paths
needed to be evaluated just to be thrown away.

Using this algorithm we can find the prime paths for somewhat
complicated functions in a reasonable time. Please note that some
programs don't benefit from this at all. We need to find the prime
paths within a SCC, so if a single SCC is very large the function
degenerates to the naive implementation. This can probably be much
improved on, but is an exercise for later.

Interesting I was only aware of the old paper by Ball and Larus
https://ieeexplore.ieee.org/abstract/document/566449

--

OVERALL ARCHITECTURE

Like the other coverages in gcc, this operates on the CFG in the profiling
phase, just after branch and condition coverage, in phases:

1. All prime paths are generated, counted, and enumerated from the CFG
2. The paths are evaluted and counter instructions and accumulators are
    emitted
3. gcov reads the CFG and computes the prime paths (same as step 1)
4. gcov prints a report

Simply writing out all the paths in the .gcno file is not really viable,
the files would be too big. Additionally, there are limits to the
practicality of measuring (and reporting) on millions of paths, so for
most programs where coverage is feasible, computing paths should be
plenty fast. As a result, path coverage really only adds 1 bit to the
counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8
bytes, 65 paths take up 16 bytes.

path coverage can also be used to determine corelated branches (where
outcome if first if predetrmines probability of the second if) which can
be used for optimization: if such paths are detected tail duplication
will likely help propagating some extra invariants.

For this we also need to know actual frequencies of the paths, not only
bit if it was or was not taken.

This would be a relatively simple extension, I think, using adds rather than setting bits. I suppose compiled object file size would increase by a constant factor, too?


Recording paths is really just massaging large bitsets. Per function,
ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are
sorted, so the first path maps to the lowest bit, the second path to the
second lowest bit, and so on. On taking an edge and entering a basic
block, a few bitmasks are applied to unset the bits corresponding to the
paths outside the block and set the bits of the paths that start in that
block. Finally, the right buckets are masked and written to the global
accumulators for the paths that end in the block. Full coverage is
achieved when all bits are set.

gcc does not really inform gcov of abnormal paths, so paths with

Adding abnormal edges is probably not hard, but I am not sure how
realistic is to cover them all.  Even EH introduces edges that can not
really be taken at runtime.

For coverage, absolutely, but it is a (minor) source of complexity because of special rules and how some abnormal edges become fake edges when recorded in .gcno.


abnormal paths are ignored. This probably possible, but requires some
changes to the graph gcc writes to the .gcno file.

If I recall correctly, Ball&Larus simply stores counts into hashtable
assuming that most paths are not taken dynamically.

+@item -e
+@itemx --prime-paths
+Write path coverage to the output file, and write path summary info to
+the standard output.  This option allows you to see how many prime paths
+were taken at least once.  For the regular output this option only
+includes the number of paths covered.  For more fine grained information
+on paths you can use @option{--prime-paths-lines} or
+@option{--prime-paths-source}.  With @option{--json-format} all path
+details are included in the output.  This requires you to compile the
+source with @option{-fpath-coverage}.
+
+@item --prime-paths-lines [=@var{type}]
+Write path coverage to the output file, and write path summary info to
+the standard output.  This option allows you to see how many prime paths
+were taken at least once, and dense report on the covered or uncovered
+paths and how to cover them.  This mode is useful for automated
+reporting and progress tracking. @var{type} may be omitted, or one of:

I wonder if some short explanation of "prime path" can fit here (or
somewhre in doc files?) I think most users would have to google what it
is.

I can write one, sure.


+-fpath-coverage

Maybe we should stick either ot path or prime path and use it in both
options?

Certainly - my main reasons for the flag was that 1. it's short and pointed and 2. it opens up for grouping other kinds of paths, should they ever be interesting, like -fsanitize (-fpath-coverage=prime,simple for example). Plus symmetry with -farc-coverage, -fcondition-coverage. I'm happy to rename it.


+/* Build a struct graph equivalent to the CFG for the function FN.  The caller
+   must free the returned graph with free_graph.  The data field of every
+   vertex and edge point to the basic blocks and edges in the CFG.
+
+   The CFG recording and gcov is not aware of abnormal edges, so they are
+   ignored here, too.  This means some paths are lost, e.g. those that involve
+   setjmp/longjmp.  They are still paths but would need more support from
+   profile.cc and gcov.cc to work.  */
+struct graph*
+cfg_as_graph (struct function* fn)
+{
+  struct graph *g = new_graph (n_basic_blocks_for_fn (fn));
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
+  basic_block exit = EXIT_BLOCK_PTR_FOR_FN (fn);
+
+  g->vertices[entry->index].data = entry;
+  g->vertices[exit->index].data = exit;
+  basic_block top = single_succ (entry);
+  add_edge (g, entry->index, top->index)->data = single_succ_edge (entry);
+
+  const unsigned ignore = EDGE_FAKE | EDGE_ABNORMAL | EDGE_ABNORMAL_CALL;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      g->vertices[bb->index].data = bb;
+      for (edge e : bb->succs)
+       if (!(e->flags & ignore))
+         add_edge (g, e->src->index, e->dest->index)->data = e;
+    }
+  return g;
+}

Overall it would be nice to have base class with API for (multi)graph
that can be used by CFG, callgraph and gcov's CFG so we can run same
algorthms on them.

We do have quite some duplication between graph algorithms we run on
these. Moreover IPA summaries would benefit from a simplified and
memory-efecient CFG, too.

That would be useful, indeed. I found this one because I needed the component graph and found graphds_scc. As a sidenote, I think I counted four or five topsorts in gcc (and two are mine!), but topsort is somewhat natural to custom fit to the type/class so it doesn't seem so bad.

+/* Return the edge between SRC and DST.  */
+edge
+edge_between (struct function *fn, int src, int dst)
+{
+  basic_block bbsrc = BASIC_BLOCK_FOR_FN (fn, src);
+  basic_block bbdst = BASIC_BLOCK_FOR_FN (fn, dst);
+  for (edge e : bbsrc->succs)
+    if (e->dest == bbdst)
+      return e;
+  gcc_checking_assert (false);
gcc_unreachable would b perhaps more readable (but will stay in code
with checking disabled).
+  return NULL;
+}
+
+/* Visitor for topsort.  */
+void
+topsort1 (basic_block b, vec<basic_block>& L, sbitmap visited)

Uppercase for variable name is not really GNU coding style friendly.
Also more desciprive name would be useful :)

Will fix.

+{
+  if (bitmap_bit_p (visited, b->index))
+    return;
+
+  for (edge e : b->succs)
+    if (!(e->flags & EDGE_DFS_BACK))
+      topsort1 (e->dest, L, visited);

Please avoid recursion here. Perhaps you can use post_order_compute
instead??
+
+/* Find the union of possible preserved by bitwise-ANDs for the bucket BUCKET
+   of BB.  If this returns zero no paths go through BB at BUCKET.  */

I am not sure what this means. Perhaps it can be rephrased?

+  gimple *def = SSA_NAME_DEF_STMT (local);
+  gphi *phi = nullptr;
+  if (is_a <gphi *> (def))
+    phi = as_a <gphi *> (def);
gphi *phi = dyn_cast <gphi *> (def)
+
+/* Insert instructions update the global counter at BUCKET with the contents of
                           updating?
+   (LOCAL & MASK).  LOCAL is the SSA counter for this bucket, MASK the bits
+   that should be updated (that is, the paths that end in this basic block).
+   If ATOMIC_IOR is non-null the it uses atomics.  GCOV_TYPE_NODE should be a
+   tree of the gcov type node for creating SSA names.
+
+   global[BUCKET] |= (LOCAL & MASK)
+
+   If MASK is null, no mask is applied and it becomes:
+
+   global[BUCKET] |= LOCAL
+
+   This function should be used on every block except returns_twice blocks (see
+   flush_on_edge) as all incoming edges can use the same flushing code.  */
+void
+flush_on_gsi (gimple_stmt_iterator *gsi, size_t bucket, tree local, tree mask,
+             tree atomic_ior, tree gcov_type_node)
+{
+  tree relaxed = nullptr;

I wonder if NULL_TREE or nullptr is preferred these days :)

!

I generally try to mimic the local style as much as possible, but it is easy to slip up. I see both are used interchangeably and frequently, do you have a preference?


Otherwise the profiling code reads well. Especially given it does
relatively coplex job.
diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
new file mode 100644
index 00000000000..fe6bff74dd4
--- /dev/null
+++ b/gcc/prime-paths.cc
@@ -0,0 +1,2031 @@

Missing the mandatory comment about file being part of GCC and covered
by GPL. Please check the ohter new files too.

Will fix.


+/* Merge the tries T1, T2, T3, and set of paths VECS into the larges trie.
+   Returns a reference to the trie merged into.  Merging tries and resolving
+   redundant paths is the slowest step (at least in the sense it works on the
+   largest input), and merging into a partial result reduces the work
+   accordingly.  For large problems this is a massive improvement, which in the
+   worst cases (where all tries but one are empty or almost empty) speed up
+   30-40%.  */
+trie&
+merge (trie &t1, trie &t2, trie &t3, vec<vec<vec<int>>> &vecs)

Seems you are first to use three nested vectors, congratulations :)

I'm not going to lie, some of these bits and pieces are not at all pretty. I had to do a few tricks to make it reasonably fast in most cases, this is one of them. I tried using a custom class for the inner set (vec<vec<int>>) but it didn't really improve much.

+{
+  trie *dst = nullptr;
+  const size_t s1 = t1.size ();
+  const size_t s2 = t2.size ();
+  const size_t s3 = t3.size ();
+
+  if (s1 >= s2 && s1 >= s3)
+    {
+      dst = &t1;
+      t1.merge (t2);
+      t1.merge (t3);
+    }
+  else if (s2 >= s1 && s2 >= s3)
+    {
+      dst = &t2;
+      t2.merge (t1);
+      t2.merge (t3);
+    }
+  else
+    {
+      dst = &t3;
+      t3.merge (t1);
+      t3.merge (t2);
+    }
+
+  gcc_checking_assert (dst);
+  for (const vec<vec<int>> &v2 : vecs)
+    for (const vec<int> &v1 : v2)
+      dst->insert_with_suffix (v1);
+  return *dst;
+}
+
+/* Create a CFG that roughly corresponds to this program:
+
+int binary_search(int a[], int len, int from, int to, int key)

Maybe std::binary_search would work too?

I'm not sure I follow - the code is mostly for reference as it would cause gcc to generate more or less the graph from the Fazli paper, which then serve as self tests to verify + document the different phases. The test paths and steps are the same as used in that papers, for easier reference.

Thanks,
Jørgen


Honza

Reply via email to