On 12/10/19 1:42 PM, Jeff Law wrote:
On Tue, 2019-12-10 at 12:51 -0700, Martin Sebor wrote:
On 12/10/19 10:56 AM, David Malcolm wrote:
On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
On 12/10/19 8:46 AM, David Malcolm wrote:
For the adventurous/curious, my static analyzer branch of GCC
[1]
is
now available on Compiler Explorer (aka godbolt.org) so you can
try
it
out without building it yourself. [Thanks to Matt Godbolt,
Patrick
Quist and others at the Compiler Explorer project]
On https://godbolt.org/ within the C and C++ languages, select
"x86-64 gcc (static analysis)"
as the compiler (though strictly speaking only C is in-scope
right
now). It's configured to automatically inject -fanalyzer (just
on
this
site; it isn't the default in the branch).
Some precanned examples:
* various malloc issues: https://godbolt.org/z/tnx65n
* stdio issues: https://godbolt.org/z/4BP-Tj
* fprintf in signal handler: https://godbolt.org/z/ew7mW6
* tainted data affecting control flow:
https://godbolt.org/z/3v8vSj
* password-leakage: https://godbolt.org/z/pRPYiv
(the non-malloc examples are much more in "proof-of-concept"
territory)
Would it make sense to add an "analyzer" component to our
bugzilla,
even though this is still on a branch? (with me as default
assignee)
Yes, that would make sense to me for bugs/enhancements specific
to
just the analyzer.
I think it's important to have a shared understanding how
requests
for new warnings (either completely new features or enhancements
to
existing ones) should be triaged in Bugzilla. For instance,
pr82520
is a request to enhance -Wreturn-local-addr to detect other forms
of
escaping dangling pointers besides by returning them from
functions.
I think the request (and a number of others) would be a good fit
for
the analyzer as well, but assigning it to the analyzer component
would suggest that it's suitable only for it. How do we track
requests that we'd like handled in both?
It feels to me that the question of BZ component is a proxy for a
deeper scope question about what we want to handle in each
warning. If
it's clear that we want an issue handled by both the core code
*and* by
the analyzer, then we should have two PRs. Perhaps discuss in the
PRs
which things we expect to be handled by which warning (e.g. some
cases
might be capable of being caught by -Wreturn-local-addr, others
might
require the deeper analysis of -fanalyzer).
You're quite right that the two questions are related, and it is
the latter that prompted my comment above.
My concern is that I'm not sure we have a good rule of thumb to
decide whether a given warning is better suited to just the analyzer
or just the middle-end, or that it should be implemented in both.
Or even a good basis for making such decisions. We have not
discussed
yet if we should even worry about overlap (duplicate warnings issued
by both the analyzer and the middle-end). I asked a few questions to
help get more clarity here in my comments on the initial patch but
only
heard from Jeff.
So here are some of my thoughts. I'd be very interested in yours (or
anyone else's who cares about this subject).
I can think of no warning that's currently implemented in the middle-
end (or that could be implemented there) that would not also be
appropriate for the analyzer. IIUC, it has the potential for finding
more problems but may be more noisy than a middle-end implementation,
and take longer to analyze code. Are there any kinds of problems
that
you think are not appropriate for it? Any inherent limitations?
At the same time I also think there's room for and value in providing
at least some support in the middle-end for pretty much every warning
that the analyzer already exposes options for. They may not find as
many problems as the analyzer eventually will but they don't have as
much overhead (either to implement, or in terms of compile time --
the analysis often benefits optimizations as well). In addition, by
being tied to the optimizers the detection provides an opportunity
to also prevent the bugs (by folding invalid code away, or inserting
traps or branches around it, perhaps under the control of users as we
have discussed).
In my mind the only reason to have to choose between one and other
is the duplication of effort and resource constraints: we don't have
enough contributors into either area. There isn't a lot we can about
the latter but we should make an effort to minimize the duplication
(starting with bug triage).
But maybe I'm overanalyzing it and playing it by ear will work just
fine.
I think one of the key components will be whether or not one needs
intra-procedural state tracking to do a reasonable job. THe double-
free checking is a great example. You can build one with what's in the
tree today, but the result isn't really useful except for toy examples.
To do a reasonable job you've got to explode the graphs and track state
intra-procedurally. Thus it's a good fit for David's analyzer.
To do a thorough job, sure. But detecting bugs without inlining
everything is also valuable and (as you note below) the middle-end
does a reasonable job of it despite the limited interprocedural
analysis. Code that allocates and deallocates memory in the same
function (or in functions inlined into it) isn't uncommon; it's
even the basis of some optimizations. So I think having a simple
double-free checker in the middle-end would be beneficial as well.
THere will almost certainly be cases where some additional bit of
intra-procedural data tracking will help the existing middle end
analysis/warnings. But what's in there today works and works
reasonably well. So yes, some additional data might make
-Wuninitialized better, but it does a pretty good job with the existing
analysis without having to track data across function calls.
Double-free bugs are a small subset of the more general class of
using objects outside their lifetime: because it either hasn't
started yet (i.e., they're not initialized) or it has ended (e.g.,
after a call to free). Below is a double-free example caused by
an uninitialized read.
As another example, replace the call to free below with strcpy.
It's far more readily exploitable than the double-free but
a checker that looks just for double free won't find it.
Regrettably, -Wmaybe-uninitialized doesn't diagnose either
of these bugs but that can be fixed. (The analyzer preview
on Godolt doesn't either, and that too I'm sure can be fixed.)
My point here is that exposing the uninitialized detection logic
to the same deep analysis as the analyzer employs for double-free
would help find not just all the double-free bugs but many others
as well.
__attribute__ ((noipa)) void f (void *p) { }
__attribute__ ((noipa)) void g (int i, int j)
{
struct S { void *p; } s;
if (i)
s.p = __builtin_malloc (32);
f (&s);
if (j)
__builtin_free (s.p);
}
int main (void)
{
g (1, 1);
g (0, 1);
}
I view the capabilities of the middle-end as complementary to
the analyzer, but I don't see either as ideally suited to dealing
with any particular kinds of bugs. They each have different goals,
strengths, and limitations By not being subject to the same
constraints as the inliner, the analyzer can do a better job
detecting bugs in complex code across function boundaries.
The middle-end, on the other hand, can use its more localized
analysis not only to detect potential bugs but also to prevent
them from taking effect at runtime, or to emit more efficient
object code.
Martin