RE: Question about static code analysis features in GCC
Hi Richard, I've implemented a simple nop-pass as you described and are now investigating a path forward for static code analysis. I'm trying to modify eg. cp-pass to be able to call these workers from my analysis pass. I found some other work though done by Alexander Ivanov Sotirov called "Vulncheck". Available patch at "http://gcc.vulncheck.org/";. It seems to contain some work that might be useful to continue on? Why was not this patch applied to GCC trunk? A question from Sotirov about additional features was unanswered or done off-list? http://gcc.gnu.org/ml/gcc/2007-09/msg00549.html I guess the constant propagation etc is done by other workers/passes in GCC today, so its better to use the available workers. But when starting reading his paper, it seems to me that some parts could be usable? Also Sotirov have a "ssa-tree" approach to analysis rather than Volanchi (http://mygcc.free.fr) that using pretty-printer and pattern matching approach. (Which as I understand stopped this patch from being applied to official GCC.) Or is it even better just to do it as a plugin-pass using MELT or something similar? Thanks and Best Regards /Fredrik From: Richard Guenther [richard.guent...@gmail.com] Sent: Wednesday, February 16, 2011 11:17 To: sa...@hederstierna.com Cc: gcc@gcc.gnu.org Subject: Re: Question about static code analysis features in GCC On Wed, Feb 16, 2011 at 8:54 AM, sa...@hederstierna.com wrote: > Hi > > Thanks for you answer, I just discovered though that the array-bounds-error > could be catched by "-Warray-bounds" warning. > I guess this analysis is done in Range Value Propagation "tree-vrp.c" > The testcases I tried (+mine example code) did not warn though, is it a bug? the array-bounds warning only works when VRP is enabled which it is only at -O2 by default, usually in simple testcases accesses are optimized away. > testsuite/gcc.dg/Warray-bounds.c > testsuite/gcc.dg/Warray-bounds-2.c > testsuite/gcc.dg/Warray-bounds-3.c > testsuite/gcc.dg/Warray-bounds-4.c FAILED?? > testsuite/gcc.dg/Warray-bounds-5.c > testsuite/gcc.dg/Warray-bounds-6.c > testsuite/gcc.dg/Warray-bounds-7.c FAILED?? > testsuite/gcc.dg/Warray-bounds-8.c > > Couldn't NULL dereferences also be checked in tree-VRP to some extent? Yes, but VRP assumes that once you dereference a pointer it will be not NULL - thus its optimistic analysis does defeat the intent to warn for NULL accesses ;) > And about adding a opt-pass, do you mean about here (in passes.c) > > p = &all_regular_ipa_passes; > +NEXT_PASS (pass_ipa_static_analysis); > NEXT_PASS (pass_ipa_whole_program_visibility); No, I was thinking about Index: passes.c === --- passes.c(revision 170176) +++ passes.c(working copy) @@ -796,6 +796,7 @@ init_optimization_passes (void) *p = NULL; p = &all_regular_ipa_passes; + NEXT_PASS (pass_ipa_static_analysis); NEXT_PASS (pass_ipa_whole_program_visibility); NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_cp); at the point you show we are not yet in SSA form. The above will only reliably work at -O0 as otherwise early optimizations will have taken place. > What passes do you think have an additional mode for non-code generation, > value-numbering (tree-nrv? tree-ssa-sccvn, tree-ssa-pre?) or > constant-propagation (tree-cp)? There are none at the moment, but at least the SSA propagators (tree-ssa-ccp.c, tree-ssa-copy.c) and the value-numberer (tree-ssa-sccvn.c/tree-ssa-pre.c) whould be easy to modify. > Could this opt-stages be called earlier in the passes pipeline? I would rather arrange for the workers to be able to be called from the static analysis pass directly instead of trying to make them "passes without code-gen". Richard. > > Thanks and Best Regards > /Fredrik > > From: Richard Guenther [richard.guent...@gmail.com] > Sent: Sunday, February 13, 2011 10:54 > To: sa...@hederstierna.com > Cc: gcc@gcc.gnu.org > Subject: Re: Question about static code analysis features in GCC > > On Sun, Feb 13, 2011 at 2:34 AM, sa...@hederstierna.com > wrote: >> Hi >> >> I would like to have some advice regarding static code analysis and GCC. >> I've just reviewed several tools like Klocwork, Coverity, CodeSonar and >> PolySpace. >> These tools offer alot of features and all tools seems to find different >> types of defects. >> The tool that found most bugs on our code was Coverity, but it is also the >> most expensive tool. >> >> But basically I would most like just to find very "simple" basic errors like >> NULL-dereferences and buffer overruns. >> I attach a small example file with some very obvious errors like >> NULL-dereferences and buffer overruns. >> >> This buggy file compiles fine though without any warnings at all with GCC as >> expected >> >>gcc -o example example.c -W -Wall -Wextra >> >> I tried to add checking with mu
packaging MELT plugin documentation (GFDL for melt.texi, GPL for generated meltgendoc.texi) ?
Hello All, Since I am releasing MELT as a plugin (GPLv3+ licensed, FSF copyrighted), I would like to package the documentation, by changing the contrib/make-melt-source-tar.sh shell script of the MELT branch so that it packages appropriate *.texi files. As you probably know, MELT documentation is made (in the MELT branch) of * file melt.texi which is a chapter of the GCC internal documaentation. This file is hand written so is FSF copyrighted and GFDL licensed. * generated file meltgendoc.texi (generated by MELT from *.melt code in the build directory of the MELT branch). Since this file is generated from the *.melt source code of the MELT branch, I understand it is GPL licensed & FSF copyrighted (since generated from GPL code). It seems that because of the GPL vs GFDL licensing difference (I don't want to open a debate now), I should wrap those two things separately. So my feeling is that I should wrap these documentations in two different toplevel texinfo documents (for the MELT plugin), probably a file meltplugin.texi which would @include melt.texi which would be its only chapter, and this file should be GFDL licensed. a file meltapi.texi which would @include meltgendoc.texi so has to be GPL licensed. The point is that in an ideal world, I would like these documents to be not too big and still compatible with GCC. What should be the wrapping text around melt.texi? Can I copy http://www.gnu.org/software/texinfo/manual/texinfo/html_node/GNU-Sample-Texts.html#GNU-Sample-Texts . What should be the front-cover text & back-cover text wrapping a single chapter of the GCC MELT branch? What should be the wrapping text around generated meltgendoc.texi? How should I write that a documentation has a GPLv3 (not GFDL!) license? Regards. PS. I don't ask any "exception" to GFDL or GPL. I just want to relaase two different documentation under the licenses needed by them. In my understanding, melt.texi is GFDL so need a GFDL documentation wrapper and meltgendoc.texi is GPL so needs a GPL documentation wrapper. And I don't know how to write these wrappers. PPS. If you know about some GPL documentation text (I heard there are some), please give a pointer. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: Question about static code analysis features in GCC
On Tue, Apr 12, 2011 at 10:00 AM, sa...@hederstierna.com wrote: > Hi > > Richard, I've implemented a simple nop-pass as you described and are now > investigating a path forward for static code analysis. > I'm trying to modify eg. cp-pass to be able to call these workers from my > analysis pass. > > I found some other work though done by Alexander Ivanov Sotirov called > "Vulncheck". > Available patch at "http://gcc.vulncheck.org/";. > It seems to contain some work that might be useful to continue on? > Why was not this patch applied to GCC trunk? I don't remember that it was even proposed. It also seems to plug in late, after optimizations have been applied. > A question from Sotirov about additional features was unanswered or done > off-list? > http://gcc.gnu.org/ml/gcc/2007-09/msg00549.html Looks like a simple IL question, and I guess he is refering to the virtual SSA operands which I would suggest to completely ignore for static analysis purposes (and in fact their implementation and use is much simplified in todays GCC). > I guess the constant propagation etc is done by other workers/passes in GCC > today, so its better to use the available workers. > But when starting reading his paper, it seems to me that some parts could be > usable? I didn't read his paper, but I thought one important aspect of good static analysis is to perform it on (nearly) the original program. Re-using existing SSA value-numberings (without doing code modifications) might be a good idea, but is of course not necessary. > Also Sotirov have a "ssa-tree" approach to analysis rather than Volanchi > (http://mygcc.free.fr) that using pretty-printer and pattern matching > approach. > (Which as I understand stopped this patch from being applied to official GCC.) I don't know. > Or is it even better just to do it as a plugin-pass using MELT or something > similar? It really depends on what ultimate goal you have. As for improving the GCC codebase itself I think we all mostly agree that it would be nice to get rid of most "optimization based" warnings - but we also like to retain their preciseness in some form. Spending some extra compile-time for a limited set of static checks by doing some data-flow analysis sounds like a way to get that. And if we do it somewhen early after going into SSA then that data-flow is moderately easy and we'd retain the advantage of having common code that works with all frontends. So, anything that moves us into that direction, even if it is just infrastructure, would be nice to integrate into GCC itself. Of course working with GCC can be a pain - you need to obtain a copyright assignment, get attention to your patches, etc. - so it might be a more pleasant working experience for you to work on a plugin instead (for example if you do this merely for research purposes). Richard.
Announce: MELT plugin (0.7) release candidate 1 for GCC 4.6
Hello All I am announcing the release candidate #1 of the MELT plugin, replacing the rc0 of http://gcc.gnu.org/ml/gcc/2011-04/msg00166.html You can download a gzipped tar ball of MELT 0.7 as a plugin for GCC 4.6 from http://gcc-melt.org/melt-0.7rc1-plugin-for-gcc-4.6.tgz a gzip-ed tar archive of 3189167 bytes of md5sum 9eb14a820816a32cac18e60c7b8b54f6 (april 12th 2011) Improvements from the previous rc0 include a simple -fplugin-arg-melt-extra argument to ease the loading of an extra (MELT user provided) MELT module and some other bugfixes/improveements. Comments are welcome. If you have been able to build the MELT plugin, please tell me (by private email to avoid bothering lists) and give details (what gcc 4.6, what distributions). If you have not been able to build it, please alseo tell. Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
RE: Question about static code analysis features in GCC
Hey Sarah, Many array bounds and format string problems can already be found, especially with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file boundaries, understanding loop boundaries, and aggressive inlining allows GCC to warn about a lot of real-world vulnerabilities. When multiple IPA passes lands in trunk, it should be even better. What I think is missing is: 1) detection of double-free. This is already a function attribute called 'malloc', which is used to express a specific kind of allocation function whose return value will never be aliased. You could use that attribute, in addition to a new one ('free'), to track potential double-frees of values via VRP/IPA. 2) the ability to annotate functions as to the taint and filtering side-effects to their parameters, like the format() attribute. (I've asked for this feature from the PC-Lint people for some time.) You could make this even more generic and just add a new attribute that allows for tagging and checking of arbitrary tags: ssize_t recv(int sockfd, void *buf, size_t len, int flags) __attribute__ ((add_parameter_tag ("taint", 2))) __attribute__ ((add_return_value_tag ("taint"))); int count_sql_rows_for(const char* name) __attribute__ ((disallow_parameter_tag ("taint", 1))); void filter_sql_characters_from(const char* name) __attribute__ ((removes_parameter_tag ("taint", 1))); then a program like this: int main(void) { char name[20] = {0}; recv(GLOBAL_SOCKET, &name, sizeof(name), 0); filter_sql_characters_from(name); // comment this line to get warning count_sql_rows_for(name); } When I wrote my binary static analysis product, BugScan, we assumed that if a pointer was tainted, so was its contents. (This was especially a necessity for collections like lists and vectors in Java and C++ binaries.) You may want to get more explicit with that, by having a rescurively_add_parameter_tag() or somesuch that only applies to pointer parameters. 3) lack of explicit NULL-termination of strings. This one gets really complicated, especially for situations where they are terminated properly and then become un-terminated. 4) if a loop that writes to a pointer, and increments that pointer, is bound by a tainted value. You'd have to add an extension to the loop unroller for that, and just check for the 'taint' tag on the bounds check. Of course, you still run into temporal ordering issues, especially with globals, where the CFG ordering won't help. But don't let that discourage you -- it would be great work to see done and commoditized, and would probably be better than most commercial analyzers as well ;) Let me know if you need any more of my expertise in this area. I can't speak for GCC internals, though.
Re: Question about static code analysis features in GCC
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/11 13:33, Hargett, Matt wrote: > Hey Sarah, > > Many array bounds and format string problems can already be found, especially > with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file > boundaries, understanding loop boundaries, and aggressive inlining allows GCC > to warn about a lot of real-world vulnerabilities. When multiple IPA passes > lands in trunk, it should be even better. > > What I think is missing is: > > 1) detection of double-free. This is already a function attribute called > 'malloc', which is used to express a specific kind of allocation function > whose return value will never be aliased. You could use that attribute, in > addition to a new one ('free'), to track potential double-frees of values via > VRP/IPA. To do a good job at this, I think we need to be able to annotate functions which must/may free one of their parameters. We then need to be able to propagate that information through the call graph. Once you've got that annotation propagated through the call graph, a use-after-free (which is a superset of double-free) is a lot more powerful. This may be a subset of what you want for #2 (taint & filtering side effects for parameters). Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNpLCfAAoJEBRtltQi2kC7FBEIAKA+R7ILHrCtIFoXBbH7fmsL rU2Mjcv5enJndlqz/0yHNPQKpC4wNLrsrYghBytHCT/NC0xYXSfh4WsYqq2uRhMo GNSBl630f2/zPBZxomZFwdxmrkRSozeM69/JVyDL5jDVBdMcYZ4KwG0Wc3SybaTi OCxoCPb+GEoQqZ9HBwXp6svy+uYPZjAhpKFlW8PFksJ86x2YvBjg77ByZonY1Rku 7XeUWxFkOShLaaFlcyQWo5FI8fSLMO0lihzSgeiTQEgOYjvFX6lXTH5J6o41UQDb 9ZPr2gz5Y6ubaY7ZZM6jXDnovFVKzlRj4Bu9YEN/U7EDtUW4bKOXFqcOb4fE3SA= =snnv -END PGP SIGNATURE-
gcc4.6.0:combining operate+test
Hi All, I have been looking at a case in x86 architecture where gcc could generate better code for: if(a+=25) d=c; Insns for operation and test are: (insn 5 2 6 2 (set (reg:SI 62 [ a ]) (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32])) test_and.c:9 64 {*movsi_internal} (nil)) (insn 6 5 7 2 (parallel [ (set (reg:SI 60 [ a.1 ]) (plus:SI (reg:SI 62 [ a ]) (const_int 25 [0x19]))) (clobber (reg:CC 17 flags)) ]) test_and.c:9 252 {*addsi_1} (expr_list:REG_DEAD (reg:SI 62 [ a ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32]) (const_int 25 [0x19])) (nil) (insn 7 6 8 2 (set (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32]) (reg:SI 60 [ a.1 ])) test_and.c:9 64 {*movsi_internal} (nil)) (insn 8 7 9 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 60 [ a.1 ]) (const_int 0 [0]))) test_and.c:9 2 {*cmpsi_ccno_1} (nil)) I noticed combine.c is not able to combine insns 6 and 8. This is because create_log_links function only creates (as far as I could understand) links between the reg setter and the first reg user, but not the other reg users. Thus, combine.c do try to combine 6 and 7, but without success. Why does not create_log_links create links between the reg setter and all the reg users ? I compiled it on powerpc and got the same results (3 instructions: operate, store, test), so this behavior affects not only x86 architectures. It seems something good to optimize. best regards, Alex Rocha Prado
gcc-4.4-20110412 is now available
Snapshot gcc-4.4-20110412 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20110412/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 172346 You'll find: gcc-4.4-20110412.tar.bz2 Complete GCC (includes all of below) MD5=e566e660c9c253c8e9bd0f4005e55a9b SHA1=5e35127200f0cd2b4c5c6966fd62614063885eaa gcc-core-4.4-20110412.tar.bz2C front end and core compiler MD5=7668472ee3aecd101b90dca406e3ff0c SHA1=056d8fe354779a5b8ab07e89095026e7271288ed gcc-ada-4.4-20110412.tar.bz2 Ada front end and runtime MD5=117269f5755d1912d8895e3e1a63bb7f SHA1=0a9eeb477538a14abbcbe2e22bba26456c924e33 gcc-fortran-4.4-20110412.tar.bz2 Fortran front end and runtime MD5=b4a8a784bd0aa2cb531e36fefd0231d8 SHA1=64be532daea1eac1385a3c189c742c0ae1085270 gcc-g++-4.4-20110412.tar.bz2 C++ front end and runtime MD5=9b004fcb4c2f131f7f9b4da15a45304d SHA1=35313008e80c4d5e71961688e86ca799b9007347 gcc-go-4.4-20110412.tar.bz2 Go front end and runtime MD5=07fe7735b71115c1c341591f98816756 SHA1=cb3879dfcb05d054c2dca59bf0c98a1b386cebf4 gcc-java-4.4-20110412.tar.bz2Java front end and runtime MD5=6eac868f0e123471cab8ff974cc36241 SHA1=de68ca728f89c94299e0f131c15296103402d9fd gcc-objc-4.4-20110412.tar.bz2Objective-C front end and runtime MD5=363243bfcecda844d09fd3e76e6775d9 SHA1=93eb549aa3418534c976afed79398c5eea575445 gcc-testsuite-4.4-20110412.tar.bz2 The GCC testsuite MD5=fecb0489ce98991af8141c944edc4404 SHA1=d7f12047fc174d59294f1a285306acb13a7e972f Diffs from 4.4-20110405 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.