RE: Question about static code analysis features in GCC

2011-04-12 Thread sa...@hederstierna.com
Hi

Richard, I've implemented a simple nop-pass as you described and are now 
investigating a path forward for static code analysis.
I'm trying to modify eg. cp-pass to be able to call these workers from my 
analysis pass.

I found some other work though done by Alexander Ivanov Sotirov called 
"Vulncheck".
Available patch at "http://gcc.vulncheck.org/";.
It seems to contain some work that might be useful to continue on?
Why was not this patch applied to GCC trunk?

A question from Sotirov about additional features was unanswered or done 
off-list?
http://gcc.gnu.org/ml/gcc/2007-09/msg00549.html

I guess the constant propagation etc is done by other workers/passes in GCC 
today, so its better to use the available workers.
But when starting reading his paper, it seems to me that some parts could be 
usable?
Also Sotirov have a "ssa-tree" approach to analysis rather than Volanchi 
(http://mygcc.free.fr) that using pretty-printer and pattern matching approach.
(Which as I understand stopped this patch from being applied to official GCC.)

Or is it even better just to do it as a plugin-pass using MELT or something 
similar?

Thanks and Best Regards
/Fredrik

From: Richard Guenther [richard.guent...@gmail.com]
Sent: Wednesday, February 16, 2011 11:17
To: sa...@hederstierna.com
Cc: gcc@gcc.gnu.org
Subject: Re: Question about static code analysis features in GCC

On Wed, Feb 16, 2011 at 8:54 AM, sa...@hederstierna.com
 wrote:
> Hi
>
> Thanks for you answer, I just discovered though that the array-bounds-error 
> could be catched by "-Warray-bounds" warning.
> I guess this analysis is done in Range Value Propagation "tree-vrp.c"
> The testcases I tried (+mine example code) did not warn though, is it a bug?

the array-bounds warning only works when VRP is enabled which it
is only at -O2 by default, usually in simple testcases accesses are
optimized away.

> testsuite/gcc.dg/Warray-bounds.c
> testsuite/gcc.dg/Warray-bounds-2.c
> testsuite/gcc.dg/Warray-bounds-3.c
> testsuite/gcc.dg/Warray-bounds-4.c   FAILED??
> testsuite/gcc.dg/Warray-bounds-5.c
> testsuite/gcc.dg/Warray-bounds-6.c
> testsuite/gcc.dg/Warray-bounds-7.c   FAILED??
> testsuite/gcc.dg/Warray-bounds-8.c
>
> Couldn't NULL dereferences also be checked in tree-VRP to some extent?

Yes, but VRP assumes that once you dereference a pointer it will be
not NULL - thus its optimistic analysis does defeat the intent to
warn for NULL accesses ;)

> And about adding a opt-pass, do you mean about here (in passes.c)
>
>  p = &all_regular_ipa_passes;
> +NEXT_PASS (pass_ipa_static_analysis);
>  NEXT_PASS (pass_ipa_whole_program_visibility);

No, I was thinking about

Index: passes.c
===
--- passes.c(revision 170176)
+++ passes.c(working copy)
@@ -796,6 +796,7 @@ init_optimization_passes (void)
   *p = NULL;

   p = &all_regular_ipa_passes;
+  NEXT_PASS (pass_ipa_static_analysis);
   NEXT_PASS (pass_ipa_whole_program_visibility);
   NEXT_PASS (pass_ipa_profile);
   NEXT_PASS (pass_ipa_cp);

at the point you show we are not yet in SSA form.  The above will
only reliably work at -O0 as otherwise early optimizations will have
taken place.

> What passes do you think have an additional mode for non-code generation, 
> value-numbering (tree-nrv? tree-ssa-sccvn, tree-ssa-pre?) or 
> constant-propagation (tree-cp)?

There are none at the moment, but at least the SSA propagators
(tree-ssa-ccp.c, tree-ssa-copy.c) and the value-numberer
(tree-ssa-sccvn.c/tree-ssa-pre.c) whould be easy to modify.

> Could this opt-stages be called earlier in the passes pipeline?

I would rather arrange for the workers to be able to be called from
the static analysis pass directly instead of trying to make them
"passes without code-gen".

Richard.


>
> Thanks and Best Regards
> /Fredrik
> 
> From: Richard Guenther [richard.guent...@gmail.com]
> Sent: Sunday, February 13, 2011 10:54
> To: sa...@hederstierna.com
> Cc: gcc@gcc.gnu.org
> Subject: Re: Question about static code analysis features in GCC
>
> On Sun, Feb 13, 2011 at 2:34 AM, sa...@hederstierna.com
>  wrote:
>> Hi
>>
>> I would like to have some advice regarding static code analysis and GCC.
>> I've just reviewed several tools like Klocwork, Coverity, CodeSonar and 
>> PolySpace.
>> These tools offer alot of features and all tools seems to find different 
>> types of defects.
>> The tool that found most bugs on our code was Coverity, but it is also the 
>> most expensive tool.
>>
>> But basically I would most like just to find very "simple" basic errors like 
>> NULL-dereferences and buffer overruns.
>> I attach a small example file with some very obvious errors like 
>> NULL-dereferences and buffer overruns.
>>
>> This buggy file compiles fine though without any warnings at all with GCC as 
>> expected
>>
>>gcc -o example example.c -W -Wall -Wextra
>>
>> I tried to add checking with mu

packaging MELT plugin documentation (GFDL for melt.texi, GPL for generated meltgendoc.texi) ?

2011-04-12 Thread Basile Starynkevitch
Hello All,


Since I am releasing MELT as a plugin (GPLv3+ licensed, FSF copyrighted), I
would like to package the documentation, by changing the
contrib/make-melt-source-tar.sh shell script of the MELT branch so that it
packages appropriate *.texi files.

As you probably know, MELT documentation is made (in the MELT branch) of

* file melt.texi which is a chapter of the GCC internal documaentation. This
  file is hand written so is FSF copyrighted and GFDL licensed.

* generated file meltgendoc.texi (generated by MELT from *.melt code in the
build directory of the MELT branch). Since this file is generated from the
*.melt source code of the MELT branch, I understand it is GPL licensed & FSF
copyrighted (since generated from GPL code).

It seems that because of the GPL vs GFDL licensing difference (I don't want
to open a debate now), I should wrap those two things separately.

So my feeling is that I should wrap these documentations in two different
toplevel texinfo documents (for the MELT plugin), probably

a file meltplugin.texi which would @include melt.texi which would be its
only chapter, and this file should be GFDL licensed.

a file meltapi.texi which would @include meltgendoc.texi so has to be GPL
licensed.

The point is that in an ideal world, I would like these documents to be not
too big and still compatible with GCC.

What should be the wrapping text around melt.texi? Can I copy
http://www.gnu.org/software/texinfo/manual/texinfo/html_node/GNU-Sample-Texts.html#GNU-Sample-Texts
. What should be the front-cover text & back-cover text wrapping a single
chapter of the GCC MELT branch?

What should be the wrapping text around generated meltgendoc.texi? How
should I write that a documentation has a GPLv3 (not GFDL!) license?

Regards.

PS. I don't ask any "exception" to GFDL or GPL. I just want to relaase two
different documentation under the licenses needed by them. In my
understanding, melt.texi is GFDL so need a GFDL documentation wrapper and
meltgendoc.texi is GPL so needs a GPL documentation wrapper. And I don't
know how to write these wrappers.

PPS. If you know about some GPL documentation text (I heard there are some),
please give a pointer.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: Question about static code analysis features in GCC

2011-04-12 Thread Richard Guenther
On Tue, Apr 12, 2011 at 10:00 AM, sa...@hederstierna.com
 wrote:
> Hi
>
> Richard, I've implemented a simple nop-pass as you described and are now 
> investigating a path forward for static code analysis.
> I'm trying to modify eg. cp-pass to be able to call these workers from my 
> analysis pass.
>
> I found some other work though done by Alexander Ivanov Sotirov called 
> "Vulncheck".
> Available patch at "http://gcc.vulncheck.org/";.
> It seems to contain some work that might be useful to continue on?
> Why was not this patch applied to GCC trunk?

I don't remember that it was even proposed.  It also seems to plug in
late, after optimizations have been applied.

> A question from Sotirov about additional features was unanswered or done 
> off-list?
> http://gcc.gnu.org/ml/gcc/2007-09/msg00549.html

Looks like a simple IL question, and I guess he is refering to the virtual
SSA operands which I would suggest to completely ignore for static
analysis purposes (and in fact their implementation and use is much
simplified in todays GCC).

> I guess the constant propagation etc is done by other workers/passes in GCC 
> today, so its better to use the available workers.
> But when starting reading his paper, it seems to me that some parts could be 
> usable?

I didn't read his paper, but I thought one important aspect of good
static analysis is to perform it on (nearly) the original program.  Re-using
existing SSA value-numberings (without doing code modifications) might
be a good idea, but is of course not necessary.

> Also Sotirov have a "ssa-tree" approach to analysis rather than Volanchi 
> (http://mygcc.free.fr) that using pretty-printer and pattern matching 
> approach.
> (Which as I understand stopped this patch from being applied to official GCC.)

I don't know.

> Or is it even better just to do it as a plugin-pass using MELT or something 
> similar?

It really depends on what ultimate goal you have.  As for improving the
GCC codebase itself I think we all mostly agree that it would be nice to
get rid of most "optimization based" warnings - but we also like to
retain their preciseness in some form.  Spending some extra compile-time
for a limited set of static checks by doing some data-flow analysis
sounds like a way to get that.  And if we do it somewhen early after
going into SSA then that data-flow is moderately easy and we'd retain
the advantage of having common code that works with all frontends.

So, anything that moves us into that direction, even if it is just
infrastructure,
would be nice to integrate into GCC itself.

Of course working with GCC can be a pain - you need to obtain a
copyright assignment, get attention to your patches, etc. - so it
might be a more pleasant working experience for you to work
on a plugin instead (for example if you do this merely for research
purposes).

Richard.


Announce: MELT plugin (0.7) release candidate 1 for GCC 4.6

2011-04-12 Thread Basile Starynkevitch

Hello All

I am announcing the release candidate #1 of the MELT plugin, replacing the rc0 
of http://gcc.gnu.org/ml/gcc/2011-04/msg00166.html

You can download a gzipped tar ball of MELT 0.7 as a plugin for GCC 4.6 from
http://gcc-melt.org/melt-0.7rc1-plugin-for-gcc-4.6.tgz a gzip-ed tar archive
of 3189167 bytes of md5sum 9eb14a820816a32cac18e60c7b8b54f6 (april 12th
2011)

Improvements from the previous rc0 include a simple -fplugin-arg-melt-extra
argument to ease the loading of an extra (MELT user provided) MELT module
and some other bugfixes/improveements.

Comments are welcome. If you have been able to build the MELT plugin, please
tell me (by private email to avoid bothering lists) and give details (what
gcc 4.6, what distributions). If you have not been able to build it, please
alseo tell.

Regards.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


RE: Question about static code analysis features in GCC

2011-04-12 Thread Hargett, Matt
Hey Sarah,

Many array bounds and format string problems can already be found, especially 
with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file 
boundaries, understanding loop boundaries, and aggressive inlining allows GCC 
to warn about a lot of real-world vulnerabilities. When multiple IPA passes 
lands in trunk, it should be even better.

What I think is missing is:

1) detection of double-free. This is already a function attribute called 
'malloc', which is used to express a specific kind of allocation function whose 
return value will never be aliased. You could use that attribute, in addition 
to a new one ('free'), to track potential double-frees of values via VRP/IPA.

2) the ability to annotate functions as to the taint and filtering side-effects 
to their parameters, like the format() attribute. (I've asked for this feature 
from the PC-Lint people for some time.) You could make this even more generic 
and just add a new attribute that allows for tagging and checking of arbitrary 
tags:
ssize_t recv(int sockfd, void *buf, size_t len, int flags) __attribute__ 
((add_parameter_tag ("taint", 2)))
   __attribute__ 
((add_return_value_tag ("taint")));

int count_sql_rows_for(const char* name) __attribute__ ((disallow_parameter_tag 
("taint", 1)));
void filter_sql_characters_from(const char* name) __attribute__ 
((removes_parameter_tag ("taint", 1)));

then a program like this:
int main(void) {
  char name[20] = {0};
  recv(GLOBAL_SOCKET, &name, sizeof(name), 0);
  filter_sql_characters_from(name); // comment this line to get warning
  count_sql_rows_for(name);
}

When I wrote my binary static analysis product, BugScan, we assumed that if a 
pointer was tainted, so was its contents. (This was especially a necessity for 
collections like lists and vectors in Java and C++ binaries.) You may want to 
get more explicit with that, by having a rescurively_add_parameter_tag() or 
somesuch that only applies to pointer parameters.

3) lack of explicit NULL-termination of strings. This one gets really 
complicated, especially for situations where they are terminated properly and 
then become un-terminated.

4) if a loop that writes to a pointer, and increments that pointer, is bound by 
a tainted value. You'd have to add an extension to the loop unroller for that, 
and just check for the 'taint' tag on the bounds check.


Of course, you still run into temporal ordering issues, especially with 
globals, where the CFG ordering won't help.

But don't let that discourage you -- it would be great work to see done and 
commoditized, and would probably be better than most commercial analyzers as 
well ;)

Let me know if you need any more of my expertise in this area. I can't speak 
for GCC internals, though.




Re: Question about static code analysis features in GCC

2011-04-12 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/12/11 13:33, Hargett, Matt wrote:
> Hey Sarah,
> 
> Many array bounds and format string problems can already be found, especially 
> with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file 
> boundaries, understanding loop boundaries, and aggressive inlining allows GCC 
> to warn about a lot of real-world vulnerabilities. When multiple IPA passes 
> lands in trunk, it should be even better.
> 
> What I think is missing is:
> 
> 1) detection of double-free. This is already a function attribute called 
> 'malloc', which is used to express a specific kind of allocation function 
> whose return value will never be aliased. You could use that attribute, in 
> addition to a new one ('free'), to track potential double-frees of values via 
> VRP/IPA.
To do a good job at this, I think we need to be able to annotate
functions which must/may free one of their parameters.  We then need to
be able to propagate that information through the call graph.

Once you've got that annotation propagated through the call graph, a
use-after-free (which is a superset of double-free) is a lot more powerful.

This may be a subset of what you want for #2 (taint & filtering side
effects for parameters).

Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNpLCfAAoJEBRtltQi2kC7FBEIAKA+R7ILHrCtIFoXBbH7fmsL
rU2Mjcv5enJndlqz/0yHNPQKpC4wNLrsrYghBytHCT/NC0xYXSfh4WsYqq2uRhMo
GNSBl630f2/zPBZxomZFwdxmrkRSozeM69/JVyDL5jDVBdMcYZ4KwG0Wc3SybaTi
OCxoCPb+GEoQqZ9HBwXp6svy+uYPZjAhpKFlW8PFksJ86x2YvBjg77ByZonY1Rku
7XeUWxFkOShLaaFlcyQWo5FI8fSLMO0lihzSgeiTQEgOYjvFX6lXTH5J6o41UQDb
9ZPr2gz5Y6ubaY7ZZM6jXDnovFVKzlRj4Bu9YEN/U7EDtUW4bKOXFqcOb4fE3SA=
=snnv
-END PGP SIGNATURE-


gcc4.6.0:combining operate+test

2011-04-12 Thread cirrus75

 Hi All,

 I have been looking at a case in x86 architecture where gcc could generate 
better code for:

if(a+=25)
 d=c;

 
 Insns for operation and test are:


(insn 5 2 6 2 (set (reg:SI 62 [ a ])
(mem/c/i:SI (symbol_ref:DI ("a")  ) [2 a+0 
S4 A32])) test_and.c:9 64 {*movsi_internal}
 (nil))

(insn 6 5 7 2 (parallel [
(set (reg:SI 60 [ a.1 ])
(plus:SI (reg:SI 62 [ a ])
(const_int 25 [0x19])))
(clobber (reg:CC 17 flags))
]) test_and.c:9 252 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 62 [ a ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:DI ("a")  
) [2 a+0 S4 A32])
(const_int 25 [0x19]))
(nil)

(insn 7 6 8 2 (set (mem/c/i:SI (symbol_ref:DI ("a")  ) [2 a+0 S4 A32])
(reg:SI 60 [ a.1 ])) test_and.c:9 64 {*movsi_internal}
 (nil))

(insn 8 7 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 60 [ a.1 ])
(const_int 0 [0]))) test_and.c:9 2 {*cmpsi_ccno_1}
 (nil))


  I noticed combine.c is not able to combine insns 6 and 8. This is because 
create_log_links function only creates (as far as I could understand) links 
between the reg setter and the first reg user, but not the other reg users. 
Thus, combine.c do try to combine 6 and 7, but without success.

  Why does not create_log_links create links between the reg setter and all the 
reg users ?

  I compiled it on powerpc and got the same results (3 instructions: operate, 
store, test), so this behavior affects not only x86 architectures. It seems 
something good to optimize.
  
best regards,
Alex Rocha Prado

  

  


gcc-4.4-20110412 is now available

2011-04-12 Thread gccadmin
Snapshot gcc-4.4-20110412 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20110412/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 172346

You'll find:

 gcc-4.4-20110412.tar.bz2 Complete GCC (includes all of below)

  MD5=e566e660c9c253c8e9bd0f4005e55a9b
  SHA1=5e35127200f0cd2b4c5c6966fd62614063885eaa

 gcc-core-4.4-20110412.tar.bz2C front end and core compiler

  MD5=7668472ee3aecd101b90dca406e3ff0c
  SHA1=056d8fe354779a5b8ab07e89095026e7271288ed

 gcc-ada-4.4-20110412.tar.bz2 Ada front end and runtime

  MD5=117269f5755d1912d8895e3e1a63bb7f
  SHA1=0a9eeb477538a14abbcbe2e22bba26456c924e33

 gcc-fortran-4.4-20110412.tar.bz2 Fortran front end and runtime

  MD5=b4a8a784bd0aa2cb531e36fefd0231d8
  SHA1=64be532daea1eac1385a3c189c742c0ae1085270

 gcc-g++-4.4-20110412.tar.bz2 C++ front end and runtime

  MD5=9b004fcb4c2f131f7f9b4da15a45304d
  SHA1=35313008e80c4d5e71961688e86ca799b9007347

 gcc-go-4.4-20110412.tar.bz2  Go front end and runtime

  MD5=07fe7735b71115c1c341591f98816756
  SHA1=cb3879dfcb05d054c2dca59bf0c98a1b386cebf4

 gcc-java-4.4-20110412.tar.bz2Java front end and runtime

  MD5=6eac868f0e123471cab8ff974cc36241
  SHA1=de68ca728f89c94299e0f131c15296103402d9fd

 gcc-objc-4.4-20110412.tar.bz2Objective-C front end and runtime

  MD5=363243bfcecda844d09fd3e76e6775d9
  SHA1=93eb549aa3418534c976afed79398c5eea575445

 gcc-testsuite-4.4-20110412.tar.bz2   The GCC testsuite

  MD5=fecb0489ce98991af8141c944edc4404
  SHA1=d7f12047fc174d59294f1a285306acb13a7e972f

Diffs from 4.4-20110405 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.