Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin

2023-03-26 Thread David Malcolm via Gcc
On Sat, 2023-03-25 at 15:38 -0400, Eric Feng via Gcc wrote:
> Hi GCC community,
> 
> For GSoC, I am extremely interested in working on the selected
> project
> idea with respect to extending the static analysis pass. In
> particular, porting gcc-python-plugin's cpychecker to a plugin for
> GCC
> -fanalyzer as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646.

Hi Eric, welcome to the GCC commmunity.

I'm the author/maintainer of GCC's static analysis pass.  I'm also the
author of gcc-python-plugin and its erstwhile "cpychecker" code, so I'm
pleased that you're interested in the project.

I wrote gcc-python-plugin and cpychecker over a decade ago when I was
focused on CPython development (before I switched to GCC development),
but it's heavily bitrotted over the years, as I didn't have enough
cycles to keep it compatible with changes in both GCC and CPython
whilst working on GCC itself.  In particular, the cpychecker code
stopped working a number of GCC releases ago.  However, the cpychecker
code inspired much of my work on GCC's static analysis pass and on its
diagnostics subsystem, so much of it now lives on in C++ form as core
GCC functionality.  Also, the Python community would continue to find
static analysis of CPython extension modules useful, so it would be
good to have the idea live on as a GCC plugin on top of -fanalyzer.

>  Please find an
> initial draft of my proposal below and let me know if it is a
> reasonable starting point. Please also correct me if I am
> misunderstanding any particular tasks and let me know what areas I
> should add more information for or what else I may do in preparation.

Some ideas for familiarizing yourself with the problem space:

You should try building GCC from source, and hack in a trivial warning
that emits "hello world, I'm compiling function 'foo'".  I wrote a
guide to GCC for new contributors here that should get you started:
  https://gcc-newbies-guide.readthedocs.io/en/latest/
This will help you get familiar with GCC's internals, and although the
plan is to write a plugin, I expect that you'll run into places where a
patch to GCC itself is more appropriate (bugs and missing functionality
), so having your own debug build of GCC is a good idea.

You should become familiar with CPython's extension and embedding API.
See the excellent documentation here:
  https://docs.python.org/3/extending/extending.html
It's probably a good exercise to write your own trivial CPython
extension module.

You can read the old cpychecker code inside the gcc-python-plugin
repository, and I gave a couple of talks on it as PyCon a decade ago:

PyCon2012: "Static analysis of Python extension modules using GCC"
https://pyvideo.org/pycon-us-2012/static-analysis-of-python-extension-modules-using.html

PyCon2013: "Death by a thousand leaks: what statically-analysing 370
Python extensions looks like"
https://pyvideo.org/pycon-us-2013/death-by-a-thousand-leaks-what-statically-analys.html
https://www.youtube.com/watch?v=bblvGKzZfFI

(sorry about all the "ums" and "errs"; it's fascinating and
embarrassing to watch myself from 11 years ago on this, and see how
much I've both forgotten and learned in the meantime.  Revisiting this
work, I'm ashamed to see that I was referring to the implementation as
based on "abstract interpretation" (and e.g. absinterp.py), when I now
realize it's actually based on symbolic execution (as is GCC's-
fanalyzer)

Also, this was during the transition era between Python 2 and Python 3,
whereas now we only have to care about Python 3.

There may be other caveats; I haven't fully rewatched those talks yet
:-/

Various comments inline below, throughout...

> 
> ___
> 
> Describe the project and clearly define its goals:
> One pertinent use case of the gcc-python plugin is as a static
> analysis tool for CPython extension modules.

It might be more accurate to use the past tense when referring to the
gcc-python plugin, alas.

>  The main goal is to help
> programmers writing extensions identify common coding errors.
> Broadly,
> the goal of this project is to port the functionalities of cpychecker
> to a -fanalyzer plugin.

(nods)

> 
> Below is a brief description of the functionalities of the static
> analysis tool for which I will work on porting over to a -fanalyzer
> plugin. The structure of the objectives is taken from the
> gcc-python-plugin documentation:
> 
> Reference count checking: Manipulation of PyObjects is done via the
> CPython API and in particular with respect to the objects' reference
> count. When the reference count belonging to an object drops to zero,
> we should free all resources associated with it. This check helps
> ensure programmers identify problems with the reference count
> associated with an object. For example, memory leaks with respect to
> forgetting to decrement the reference count of an object (analogous
> to
> malloc() without corresponding free()) or perhaps object access after
> the object's reference co

Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests

2023-03-26 Thread Shengyu Huang via Gcc
Hi Dave,

(I forgot to cc the list in the last email and it was too late to unsend. Sorry 
for sending you the same email again.)

> On 20 Mar 2023, at 23:50, David Malcolm  > wrote:
> 
> I think if you try the patch to sm.cc  above, then you'll see
> various existing DejaGnu tests below gcc.dg/analyzer will fail with
> state explosions.

After patching on the latest trunk, the DejaGnu tests report two cases with 
state explosion:

pr93032-mztools-{signed, unsigned}-char.c

I didn’t see any cases with ICE though.

In addition, although I did see “warning: terminating analysis for this program 
point…” in the test log, nothing was reported when I ran the individual test 
(with or without gdb)…Did I miss anything?

Just by looking at these test files, it seems that it may have to do with how 
the analyzer does path selection, because there are many nested conditionals in 
these two files. As I mentioned in the proposal, it would be curious if this 
state explosion only happens for taint analysis, because I don’t think there is 
anything special about taint analysis that would cause state explosion (unless 
there is some buggy implementation?).

I will look at your latest patch. It seems that there are many useful tips that 
can help me further investigate the internals of analyzer. Thanks a lot!

Best,
Shengyu

Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests

2023-03-26 Thread David Malcolm via Gcc
On Sun, 2023-03-26 at 18:03 +0200, Shengyu Huang wrote:
> Hi Dave,
> 
> (I forgot to cc the list in the last email and it was too late to
> unsend. Sorry for sending you the same email again.)
> 
> > On 20 Mar 2023, at 23:50, David Malcolm
> > mailto:dmalc...@redhat.com>> wrote:
> > 
> > I think if you try the patch to sm.cc  above, then
> > you'll see
> > various existing DejaGnu tests below gcc.dg/analyzer will fail with
> > state explosions.
> 
> After patching on the latest trunk, the DejaGnu tests report two
> cases with state explosion:
> 
> pr93032-mztools-{signed, unsigned}-char.c
> 
> I didn’t see any cases with ICE though.
> 
> In addition, although I did see “warning: terminating analysis for
> this program point…” in the test log, nothing was reported when I ran
> the individual test (with or without gdb)…Did I miss anything?

The warning is coming from -Wanalyzer-too-complex.  This is disabled by
default with -fanalyzer, so you won't see it if you try to compile the
.c file "by hand", but the testsuite enables it by default (in
analyzer.exp).

> 
> Just by looking at these test files, it seems that it may have to do
> with how the analyzer does path selection, because there are many
> nested conditionals in these two files. As I mentioned in the
> proposal, it would be curious if this state explosion only happens
> for taint analysis, because I don’t think there is anything special
> about taint analysis that would cause state explosion (unless there
> is some buggy implementation?).

I has looked into compiling those files with the patch some time ago;
looking at my notes, one issue was with this on-stack buffer:
char extra[1024];
declared outside the loop.  Inside the loop, it gets modified in
various ways:
extra[0] = '\0';
and
if (fread(extra, 1, extsize, fpZip) == extsize) {
where the latter means "extra" becomes tainted.

However "extra" is barely used, and is effectively reset each time
through the loop - but the analyzer doesn't figure that out.  So the
loop analysis explodes, as it tries to keep track of the possibility
that "extra" is still tainted from previous iteration(s), despite the
fact that it's going to be clobbered before it ever gets used.

So one fix might be to extend the state-purging code so that it somehow
"sees" that "extra" gets clobbered before it gets used, and thus we can
purge the tainted state from it.

Hope that makes sense
Dave


> 
> I will look at your latest patch. It seems that there are many useful
> tips that can help me further investigate the internals of analyzer.
> Thanks a lot!
> 
> Best,
> Shengyu



GCC ASAN breaks glob()?

2023-03-26 Thread Paul Smith
OK here's something super-strange I discovered:

Enabling -faddress=sanitize in GCC, causes the glob(3) function to
misbehave.

I'm using GCC 11.3 / glibc 2.35 (x86_64 native).  I have this simple
program:

$ cat /tmp/tstglob.c
#include 
#include 

int main(int argc, char *argv[])
{
glob_t gl = {0};
int res = glob(argv[1], 0, NULL, &gl);

switch (res)
{
case 0: printf("success\n"); break;
case GLOB_NOMATCH: printf("no match\n"); break;
default: printf("unknown: %d\n", res); break;
}

return 0;
}

Now I create a symlink that doesn't point to anything:

  $ ln -s nosuchfile /tmp/badlink
  $ ls -al /tmp/badlink
  lrwxrwxrwx 1 pds pds 10 Mar 26 14:52 /tmp/badlink -> nosuchfile

Now I compile the above program normally and run it:

  $ gcc -o /tmp/tstglob /tmp/tstglob.c
  $ /tmp/tstglob /tmp/badlink
  success

This is what I expect: the symlink does exist even though it doesn't
point to anything so glob() should return it.

But now if I compile with ASAN:

  $ gcc -fsanitize=address -o /tmp/tstglob /tmp/tstglob.c
  $ /tmp/tstglob /tmp/badlink
  no match

...?!?!?!

Is there something in the ASAN library that takes over glob(3) and
installs a different version (there have been plenty of versions of
glob(3) over the years in glibc which behave incorrectly when faced
with broken symlinks, heavens knows...) that overrides the glibc
version?

Or...??


Re: GCC ASAN breaks glob()?

2023-03-26 Thread Andrew Pinski via Gcc
On Sun, Mar 26, 2023 at 12:01 PM Paul Smith  wrote:
>
> OK here's something super-strange I discovered:
>
> Enabling -faddress=sanitize in GCC, causes the glob(3) function to
> misbehave.
>
> I'm using GCC 11.3 / glibc 2.35 (x86_64 native).  I have this simple
> program:

Maybe https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88054 .

Thanks,
Andrew

>
> $ cat /tmp/tstglob.c
> #include 
> #include 
>
> int main(int argc, char *argv[])
> {
> glob_t gl = {0};
> int res = glob(argv[1], 0, NULL, &gl);
>
> switch (res)
> {
> case 0: printf("success\n"); break;
> case GLOB_NOMATCH: printf("no match\n"); break;
> default: printf("unknown: %d\n", res); break;
> }
>
> return 0;
> }
>
> Now I create a symlink that doesn't point to anything:
>
>   $ ln -s nosuchfile /tmp/badlink
>   $ ls -al /tmp/badlink
>   lrwxrwxrwx 1 pds pds 10 Mar 26 14:52 /tmp/badlink -> nosuchfile
>
> Now I compile the above program normally and run it:
>
>   $ gcc -o /tmp/tstglob /tmp/tstglob.c
>   $ /tmp/tstglob /tmp/badlink
>   success
>
> This is what I expect: the symlink does exist even though it doesn't
> point to anything so glob() should return it.
>
> But now if I compile with ASAN:
>
>   $ gcc -fsanitize=address -o /tmp/tstglob /tmp/tstglob.c
>   $ /tmp/tstglob /tmp/badlink
>   no match
>
> ...?!?!?!
>
> Is there something in the ASAN library that takes over glob(3) and
> installs a different version (there have been plenty of versions of
> glob(3) over the years in glibc which behave incorrectly when faced
> with broken symlinks, heavens knows...) that overrides the glibc
> version?
>
> Or...??


Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests

2023-03-26 Thread Shengyu Huang via Gcc
Hi Dave,

> On 26 Mar 2023, at 19:14, David Malcolm  wrote:
> 
> I has looked into compiling those files with the patch some time ago;
> looking at my notes, one issue was with this on-stack buffer:
>char extra[1024];
> declared outside the loop.  Inside the loop, it gets modified in
> various ways:
>extra[0] = '\0';
> and
>if (fread(extra, 1, extsize, fpZip) == extsize) {
> where the latter means "extra" becomes tainted.
> 
> However "extra" is barely used, and is effectively reset each time
> through the loop - but the analyzer doesn't figure that out.  So the
> loop analysis explodes, as it tries to keep track of the possibility
> that "extra" is still tainted from previous iteration(s), despite the
> fact that it's going to be clobbered before it ever gets used.
> 
> So one fix might be to extend the state-purging code so that it somehow
> "sees" that "extra" gets clobbered before it gets used, and thus we can
> purge the tainted state from it.

Thanks for your notes. I think we may be talking about the same thing? If you 
look at the updated proposal (I have changed it quite a lot since I first sent 
it out), you’ll see there is one relevant paper for state merging (although it 
is slightly different from state purging, I think the goal and general 
methodology is similar): https://dslab.epfl.ch/pubs/stateMerging.pdf 

I was trying to say if some similar situation happened for other types of 
checkers, I expected state explosion would also happen. I tried to construct a 
similar example (with the same kind of reset and nested conditionals + a loop) 
but for double-free, so far no success yet. I’ll pick it up afterwards, at 
latest by next Saturday, because I need to prepare for a coming midterm on 
Friday. I will also put this test case to the proposal because it seems like a 
very good starting point for the project.

Best,
Shengyu

gcc-13-20230326 is now available

2023-03-26 Thread GCC Administrator via Gcc
Snapshot gcc-13-20230326 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20230326/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision 55bc61a75a68d1a8d1e4df170b4beef1020f1e55

You'll find:

 gcc-13-20230326.tar.xz   Complete GCC

  SHA256=418ab317742f90f1f92ea35a983224997ad7c36ba152af1fe0c128a6b52b4939
  SHA1=8618655f38c06683eb33f0b2c0cad4a77d924916

Diffs from 13-20230319 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.