Re: [PATCH] analyzer: reimplement supergraph to eliminate function_point and stmt_finder

Dimitar Dimitrov Fri, 12 Dec 2025 23:49:43 -0800

On Fri, Dec 12, 2025 at 11:48:28AM -0500, David Malcolm wrote:
> Jakub/Richi: is it OK if I push this large analyzer patch at this stage
> of gcc 16?  See the various notes below.
> 
> GCC's static analyzer code has become hard to debug and extend.
> 
> I've realized that the core data structures within it for tracking
> positions in the user's code are clunky and make things more difficult
> than they need to be.
> 
> The analyzer has a data structure called the "supergraph" which unifies
> all CFGs and the callgraph into a single directed graph expressing
> control flow and function calls in the user's code.  The core job of the
> analyzer is to walk paths in the supergraph to build a directed graph
> called the exploded graph, which combines control flow and data flow,
> and uncovers problems as it does so (e.g. double-free bugs).
> 
> Previously, the nodes in the supergraph closely matched basic blocks in
> the gimple CFG representation in the hope that this would help the
> analyzer scale better, using a class function_point to refer to places
> in the code, such as *within* a basic block/supernode.  This approach
> needed lots of awkward special cases and workarounds to deal with state
> changes that happen mid-node, which complicated the implementation and
> make debugging it hard.
> 
> This patch reimplements the analyzer's supergraph:
> 
> * eliminate class function_point in favor of a very fine-grained
> supergraph, where each node in the graph represents a location in the
> user's program, and each edge in the graph represents an operation (with
> no-op edges for showing changing locations).  The debug option
> "-fanalyzer-fine-grained" becomes redundant.
> 
> * eliminate the class hierarchy inheriting from class superedge in
> favor of having each superedge optionally own an "operation", to better
> express the state transitions along edges (composition rather than
> inheritance), and splitting up the more complicated cases into
> multiple operations/edges (making debugging easier and reasoning about
> state transitions clearer).
> 
> * perform various post-processing "passes" to the supergraph after it's
> initially constructed but before performing the analysis, such as
> simplifying the graph, improving source location information, etc
> 
> * eliminate class stmt_finder (which was always something of a hack) in
> favor of improving user source locations in the supergraph, using
> class event_loc_info more consistently, and a new class
> pending_location::fixup_for_epath for the most awkward cases (leaks)
> 
> * precompute and cache various properties in operations, such as for
> switch edges and for phi edges, rather than performing work each time we
> visit an edge.
> 
> Advantages:
> 
> * The implementation is much simpler, easier to understand and debug,
> and has much clearer separation of responsibilities.
> 
> * Locations for diagnostics are somewhat improved (due to being more
> consistent about using the goto_locus field of CFG edges when
> constructing the supergraph, and fixing up missing location data from
> gimple stmts).
> 
> * The analyzer now detects a missing "return" from a non-void-returning
> function (albeit as a read of uninitialized "<return-value>"), which
> found many lurking true +ves in the test suite.  I can fix the wording
> of this case as a follow-up.
> 
> Disadvantages:
> 
> * The supergraph is much larger than before (one node per gimple stmt,
> rather than per basic block) - but the optimizer that runs after the
> supergraph is built simplifies it somewhat (and I have various ideas for
> future simplifications which I hope will help the analyzer scale).
> 
> * all edges in the supergraph are intraprocedural, making "supergraph"
> a misnomer.
> 
> Other notes:
> 
> * I tried to maintain the behavior of -fanalyzer as closely as possible,
> but there are changes to the testsuite output.  These mostly are places
> where the exploration of the exploded graph leads to nodes not being
> merged as well as the previous implementation on a particular test case,
> leading to the analysis hitting a termination limit and bailing out.
> So I expect the analyzer's behavior to change somewhat.  I had to add
> xfails in various places - but was able to remove xfails in others.
> 
> * the testsuite was running with -fanalyzer-call-summaries enabled,
> which is not the default for users.  The new implementation uncovered
> numerous pre-existing bugs in -fanalyzer-call-summaries, so the patch
> disables this within the testsuite, matching the default for users.
> Fixing those bugs can be done separately from the patch.
> 
> * the only performance data I have so far is with a debug rather than
> release build.  "wall" time spent in the analyzer shows a slight
> improvement overall, but with one new outlier in the integration
> testsuite that now takes over an hour (specifically,
> qemu-7.2.0/build/target_hexagon_decode.c) but I'd like to go ahead with
> pushing this, and treat that specific slowdown as a bug.
> 
> I posted an incomplete version of this before the close of stage 1 here:
>   https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700883.html
> 
> Although the patch is a very large change to -fanalyzer, the changes are
> confined to that component (apart from a trivial addition of
> INCLUDE_DEQUE/#include <deque> to system.h), so I want to apply this
> patch now in stage 3: it's a big quality-of-life improvement when
> debugging -fanalyzer.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Successful run of analyzer integration tests on x86_64-pc-linux-gnu,
> with the above caveat about a performance regression on one file.
> 
> OK for trunk?
> 
> Thanks
> Dave
> 
>
...
> gcc/testsuite/ChangeLog:
>       PR analyzer/122003
>       * c-c++-common/analyzer/allocation-size-multiline-1.c: Update for split
>       of region creation events.
>       * c-c++-common/analyzer/bzip2-arg-parse-1.c: Drop test for enode
>       merging.  Add -Wno-analyzer-too-complex.
>       * c-c++-common/analyzer/coreutils-cksum-pr108664.c: Add
>       -Wno-analyzer-symbol-too-complex.  Add dg-bogus for false +ve seen
>       during patch development.
>       * c-c++-common/analyzer/coreutils-group_number.c: New test.
Hi David,


This test case fails for pru-unknown-elf, and probably other 32-bit targets:

  FAIL: c-c++-common/analyzer/coreutils-group_number.c (test for excess errors)
  Excess errors:
  
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/testsuite/c-c++-common/analyzer/coreutils-group_number.c:13:21:
 warning: conversion from 'long long unsigned int' to 'size_t' {aka 'unsigned 
int'} changes value from '18446744073709551615' to '4294967295' [-Woverflow]

I can reproduce the warning on x86 host by passing -m32.

Perhaps this test should require an effective target lp64?

Regards,
Dimitar

Re: [PATCH] analyzer: reimplement supergraph to eliminate function_point and stmt_finder

Reply via email to