On 11/16/19, Thomas Schwinge <tho...@codesourcery.com> wrote: > Hi David! > > On 2019-11-15T20:22:47-0500, David Malcolm <dmalc...@redhat.com> wrote: >> This patch kit > > (I have not looked at the patches.) ;-) > >> introduces a static analysis pass for GCC that can diagnose >> various kinds of problems in C code at compile-time (e.g. double-free, >> use-after-free, etc). > > Sounds great from the description! > > > Would it make sense to add to the wiki page > <https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer> a (high-level) > comparison to other static analyzers (Coverity, cppcheck, > clang-static-analyzer, others?), in terms of how they work, what their > respective benefits are, what their design goals are, etc. (Of course > understanding that yours is much less mature at this point; talking about > high-level design rather than current implementation status.) > > For example, why do we want that in GCC instead of an external tool -- in > part covered in your Rationale.
There are a lot of bug reports open for requests for warnings that this analyzer could solve. Users clearly want this in GCC, or else they wouldn't keep making these requests. > Can a compiler-side implementation > benefit from having more information available than an external tool? > GCC-side implementation is readily available (modulo GCC plugin > installation?) vs. external ones need to be installed/set up first. > GCC-side one only works with GCC-supported languages. GCC-side one > analyzes actual code being compiled -- thinking about preprocessor-level > '#if' etc., which surely are problematic for external tools that are not > actually replicating a real build. And so on. (If you don't want to > spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just > compare yours to external tools.) > > Just an idea, because I wondered about these things. > > >> The analyzer runs as an IPA pass on the gimple SSA representation. >> It associates state machines with data, with transitions at certain >> statements and edges. It finds "interesting" interprocedural paths >> through the user's code, in which bogus state transitions happen. >> >> For example, given: >> >> free (ptr); >> free (ptr); >> >> at the first call, "ptr" transitions to the "freed" state, and >> at the second call the analyzer complains, since "ptr" is already in >> the "freed" state (unless "ptr" is NULL, in which case it stays in >> the NULL state for both calls). >> >> Specific state machines include: >> - a checker for malloc/free, for detecting double-free, resource leaks, >> use-after-free, etc (sm-malloc.cc), and > > I can immediately see how this can be useful for a bunch of > 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source > code directives. ..., and this would've flagged existing code in the > libgomp OpenACC tests, which recently has given me some grief. Short > summary/examples: > > In addition to host-side 'malloc'/'free', there is device-side (separate > memory space) 'acc_malloc'/'acc_free'. Static checking example: don't > mix up host-side and device-side pointers. (Both are normal C/C++ > pointers. Hmm, maybe such checking could easily be implemented even > outside of your checker by annotating the respective function > declarations with an attribute describing which in/out arguments are > host-side vs. device-side pointers.) > > Then, there are functions to "map" host-side and device-side memory: > 'acc_map_data'/'acc_unmap_data'. Static checking example: you must not > 'acc_free' memory spaces that are still mapped. > > Then, there are functions like 'acc_create' (or equivalent directives > like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data' > (plus/depending on internal reference counting). Static checking > example: for a pointer returned by 'acc_create" etc., you must use > 'acc_delete' etc. instead of 'acc_free', which first does > 'acc_unmap_data' before interal 'acc_free' (..., and again all that > depending on reference counting). (Might be "interesting" to teach your > checker about the reference counting -- if that is actually necessary; > needs further thought.) > > >> The checker is implemented as a GCC plugin. >> >> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that >> would live in the GCC source tree and be shipped as part of the GCC >> tarball, >> with a new: >> --enable-plugins=[LIST OF PLUGIN NAMES] >> configure option, analogous to --enable-languages (the Makefile/configure >> machinery for handling in-tree GCC plugins is adapted from how we support >> frontends). > > I like that. Implementing this as a plugin surely must help to either > document the GCC plugin interface as powerful/mature for such a task. Or > make it so, if it isn't yet. ;-) Nick Clifton was bringing this up as a point in his talk on his annobin plugin at Cauldron; this should make him happy. > >> The default is for no such plugins to be enabled, so the default would >> be that the checker isn't built - you'd have to opt-in to building it, >> with --enable-plugins=analyzer > > I'd favor a default of '--enable-plugins=default' which enables the > "usable" plugins. Agreed. > > >> It's not clear to me whether I should focus on: >> >> (a) pruning the scope of the checker so that it works well on >> *intra*procedural C examples (and bail on anything more complex), perhaps >> targetting GCC 10 as an optional extra hidden behind >> --enable-plugins=analyzer, or >> >> (b) work on deeper interprocedural analysis (and fixing efficiency issues >> with this). > > I personally would be happy to see (a) happen now (without "optional > extra hidden behind" configure-time flag but rather hidden behind GCC > command-line flag, '-fanalyze'?), and then (b) later on. As > always, doing the incremental thing, (a) first, then (b) later, would > give it more exposure in the wild, which should help to identify design > issues etc. now instead of much later, for example. > > >> Thoughts? > > One very high-level item: you're using the very generic name "analyzer" > ('-Wanalyzer', 'gcc/analyzer/' filenames, for example, or the '-fanalyze' > I just proposed). Might there be potential for confusion (now, or in the > future) what kind of analyzer that is, or is it safe to assume that in > context of a compiler, an analyzer is always a compile-time, static code > analyzer? After all, the existing run-time ones are known as "checkers": > '-fstack-check', or "sanitizers": '--fsanitize, or "verifiers": > '-fvtable-verify'. > > > Grüße > Thomas >