Re: GCC static analysis branch now available on Compiler Explorer

David Malcolm Wed, 11 Dec 2019 12:39:56 -0800

On Tue, 2019-12-10 at 13:42 -0700, Jeff Law wrote:
> On Tue, 2019-12-10 at 12:51 -0700, Martin Sebor wrote:
> > On 12/10/19 10:56 AM, David Malcolm wrote:
> > > On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
> > > > On 12/10/19 8:46 AM, David Malcolm wrote:
> > > > > For the adventurous/curious, my static analyzer branch of GCC
> > > > > [1]
> > > > > is
> > > > > now available on Compiler Explorer (aka godbolt.org) so you
> > > > > can
> > > > > try
> > > > > it
> > > > > out without building it yourself.  [Thanks to Matt Godbolt,
> > > > > Patrick
> > > > > Quist and others at the Compiler Explorer project]
> > > > > 
> > > > > On https://godbolt.org/ within the C and C++ languages,
> > > > > select
> > > > >     "x86-64 gcc (static analysis)"
> > > > > as the compiler (though strictly speaking only C is in-scope
> > > > > right
> > > > > now).  It's configured to automatically inject -fanalyzer
> > > > > (just
> > > > > on
> > > > > this
> > > > > site; it isn't the default in the branch).
> > > > > 
> > > > > Some precanned examples:
> > > > >     * various malloc issues: https://godbolt.org/z/tnx65n
> > > > >     * stdio issues: https://godbolt.org/z/4BP-Tj
> > > > >     * fprintf in signal handler: https://godbolt.org/z/ew7mW6
> > > > >     * tainted data affecting control flow:
> > > > > https://godbolt.org/z/3v8vSj
> > > > >     * password-leakage: https://godbolt.org/z/pRPYiv
> > > > > (the non-malloc examples are much more in "proof-of-concept"
> > > > > territory)
> > > > > 
> > > > > Would it make sense to add an "analyzer" component to our
> > > > > bugzilla,
> > > > > even though this is still on a branch? (with me as default
> > > > > assignee)
> > > > 
> > > > Yes, that would make sense to me for bugs/enhancements specific
> > > > to
> > > > just the analyzer.
> > > > 
> > > > I think it's important to have a shared understanding how
> > > > requests
> > > > for new warnings (either completely new features or
> > > > enhancements
> > > > to
> > > > existing ones) should be triaged in Bugzilla.  For instance,
> > > > pr82520
> > > > is a request to enhance -Wreturn-local-addr to detect other
> > > > forms
> > > > of
> > > > escaping dangling pointers besides by returning them from
> > > > functions.
> > > > I think the request (and a number of others) would be a good
> > > > fit
> > > > for
> > > > the analyzer as well, but assigning it to the analyzer
> > > > component
> > > > would suggest that it's suitable only for it.  How do we track
> > > > requests that we'd like handled in both?
> > > 
> > > It feels to me that the question of BZ component is a proxy for a
> > > deeper scope question about what we want to handle in each
> > > warning.  If
> > > it's clear that we want an issue handled by both the core code
> > > *and* by
> > > the analyzer, then we should have two PRs.  Perhaps discuss in
> > > the
> > > PRs
> > > which things we expect to be handled by which warning (e.g. some
> > > cases
> > > might be capable of being caught by -Wreturn-local-addr, others
> > > might
> > > require the deeper analysis of -fanalyzer).
> > 
> > You're quite right that the two questions are related, and it is
> > the latter that prompted my comment above.
> > 
> > My concern is that I'm not sure we have a good rule of thumb to
> > decide whether a given warning is better suited to just the
> > analyzer
> > or just the middle-end, or that it should be implemented in both.
> > Or even a good basis for making such decisions.  We have not
> > discussed
> > yet if we should even worry about overlap (duplicate warnings
> > issued
> > by both the analyzer and the middle-end).  I asked a few questions
> > to
> > help get more clarity here in my comments on the initial patch but
> > only
> > heard from Jeff.
> > 
> > So here are some of my thoughts.  I'd be very interested in yours
> > (or
> > anyone else's who cares about this subject).
> > 
> > I can think of no warning that's currently implemented in the
> > middle-
> > end (or that could be implemented there) that would not also be
> > appropriate for the analyzer.  IIUC, it has the potential for
> > finding
> > more problems but may be more noisy than a middle-end
> > implementation,
> > and take longer to analyze code.  Are there any kinds of problems
> > that
> > you think are not appropriate for it?  Any inherent limitations?
> > 
> > At the same time I also think there's room for and value in
> > providing
> > at least some support in the middle-end for pretty much every
> > warning
> > that the analyzer already exposes options for.  They may not find
> > as
> > many problems as the analyzer eventually will but they don't have
> > as
> > much overhead (either to implement, or in terms of compile time --
> > the analysis often benefits optimizations as well).  In addition,
> > by
> > being tied to the optimizers the detection provides an opportunity
> > to also prevent the bugs (by folding invalid code away, or
> > inserting
> > traps or branches around it, perhaps under the control of users as
> > we
> > have discussed).
> > 
> > In my mind the only reason to have to choose between one and other
> > is the duplication of effort and resource constraints: we don't
> > have
> > enough contributors into either area.  There isn't a lot we can
> > about
> > the latter but we should make an effort to minimize the duplication
> > (starting with bug triage).
> > 
> > But maybe I'm overanalyzing it and playing it by ear will work just
> > fine.
> I think one of the key components will be whether or not one needs
> intra-procedural state tracking to do a reasonable job.  THe double-
> free checking is a great example.  You can build one with what's in
> the
> tree today, but the result isn't really useful except for toy
> examples.


Did you mean "interprocedural" here rather than "intraprocedural"?  If
so, I think I agree.

[I dislike this terminology and wish we had something better; our
jargon earns +1 point for showing off that we know a little Latin, but
-100 points for making communication more difficult (end-rant)]


> To do a reasonable job you've got to explode the graphs and track
> state
> intra-procedurally.  Thus it's a good fit for David's analyzer.

Again: s/intra/inter/ ?

> THere will almost certainly be cases where some additional bit of
> intra-procedural data tracking will help the existing middle end
> analysis/warnings.

Not sure whether you meant intra or inter here; sorry.

>     But what's in there today works and works
> reasonably well.  So yes, some additional data might make
> -Wuninitialized better, but it does a pretty good job with the
> existing
> analysis without having to track data across function calls

Hope I'm following your meaning here.  I think I agree with you, modulo
terminological snafu.

Dave

Re: GCC static analysis branch now available on Compiler Explorer

Reply via email to