Re: GSoC: Working on the static analyzer

2022-02-13 Thread Mir Immad via Gcc
Hi,

I wanted some clarification on bifurcating the exploded graph at call to
open().
Should the analyzer warn for code like this "when open fails" (like strchr
does when  'strchr' returns NULL)

int fd = open("NOFILE", O_RDONLY);
write(fd, "a", 1);

because of the bad file descriptor.
unless it is written like this:
if (!errno)
  write(fd, "a", 1);

Thank you.

On Tue, Feb 1, 2022 at 8:28 PM Mir Immad  wrote:

> I worked around the leak detection and also handled the case where the fd
> is not saved. I wonder why sm-file hasn't implemented it yet. I'm attaching
> a text file with analyzer warnings. I'm still on gcc-11.2.0, will move to
> v12 next thing.
>
> > I wonder if it's worth checking for attempts to write to a fd that was
> > opened with O_RDONLY, or the converse?
>
> Yes, it does not make sense to warn for write on read-only closed fd or
> converse. But, it does make sense to warn for write operation on valid
> read-only fd. For this, we might have to analyze calls to the open()
> carefully -- get the literal value of the second argument to the call and
> do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are in it.
> We might have to maintain separate vectors for these and check for them in
> calls to write and read -- warn if write is being called on read-only fd
> and converse. How do we equate them? Does gimple store unique ids for each
> tree? or something like SAME_TREE? Just thinking out loud.
>
> > how much state does it make sense to track for a fd?
> Sorry, I didnt get it. Do you mean how may states I'm using? unchecked,
> closed, null and leak_by_not_saved (for the call-tree that does not have a
> lhs).
>
> > Also, at some point, we're going to have to handle "errno" - but given
> > that might be somewhat fiddly it's OK to defer that until you're more
> > familiar with the code
>
> I'm looking forward to figure it out.
>
> Thank you.
>
>
> On Sat, Jan 29, 2022 at 11:09 PM David Malcolm 
> wrote:
>
>> On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
>> > Thank you for the detailed information.
>> >
>> > I've been looking into the integer posix  file descriptor APIs and I
>> > decided to write proof-of-concept  checker for them. (not caring
>> > about
>> > errno). The checker tracks the fd returned by open(), warns if dup()
>> > is
>> > called with closed fd otherwise tracks the fd returned by dup(), it
>> > also
>> > warns if read() and write() functions were called on closed fd. I'm
>> > attaching a text file that lists some c sources and warnings by the
>> > static
>> > analyzer. I've used the diagnostic meta-data from sm-file. Is this
>> > something that could also be added to the analyzer?
>>
>> This looks great, and very promising as both new functionality for GCC
>> 13, and as a GSoC 2022 project.
>>
>> BTW, it looks like you're working with GCC 11, but the analyzer has
>> changed quite a bit on trunk for GCC 12, so it's worth trying to track
>> trunk.
>>
>> I wonder if it's worth checking for attempts to write to a fd that was
>> opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
>> aloud - how much state does it make sense to track for a fd?).
>>
>> Also, at some point, we're going to have to handle "errno" - but given
>> that might be somewhat fiddly it's OK to defer that until you're more
>> familiar with the code.
>>
>> >
>> > About the fd leak, that's the next thing I'll try to get working.
>> > Since
>> > you've mentioned that it could be a GSoC project, this is what I'm
>> > going to
>> > focus on.
>>
>> Excellent.
>>
>> Let me know (via this mailing list) if you have any questions.
>>
>> Thanks
>> Dave
>>
>> >
>> > Regards.
>> >
>> >
>> >
>> > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm 
>> > wrote:
>> >
>> > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
>> > > > Hi, sir.
>> > > >
>> > > > I've been trying to understand the static analyzer's code. I
>> > > > spent most
>> > > > of
>> > > > my time learning the state machine's API. I learned how state
>> > > > machine's
>> > > > on_stmt is supposed to "recognize" specific functions and how
>> > > > on_transition
>> > > > takes a specific tree from one state to another, and how the
>> > > > captured
>> > > > states are used by pending_diagnostics to report the errors.
>> > > > Furthermore, I
>> > > > was able to create a dummy checker that mimicked the behaviour of
>> > > > sm-
>> > > > file's
>> > > > double_fclose and compile GCC with these changes. Is this the
>> > > > right way
>> > > > of
>> > > > learning?
>> > >
>> > > This sounds great.
>> > >
>> > > >
>> > > > As you've mentioned on the projects page that you would like to
>> > > > add
>> > > > more
>> > > > support for some POSIX APIs. Can you please write (or refer me to
>> > > > a) a
>> > > > simple C program that uses such an API (and also what the
>> > > > analyzer
>> > > > should
>> > > > have done) so that I can attempt to add such a checker to the
>> > > > analyzer.
>> > >
>> > > A couple of project ideas:
>> > >
>

gcc-12-20220213 is now available

2022-02-13 Thread GCC Administrator via Gcc
Snapshot gcc-12-20220213 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20220213/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision 58aeb75d4097010ad9bb72b964265b18ab284f93

You'll find:

 gcc-12-20220213.tar.xz   Complete GCC

  SHA256=40b353da0f276e9323c250959284b642989949efdf40022d5f4227f7dd3c733d
  SHA1=4c45ee4d94b2c1b0d9683cfafae4e7118fd75eb2

Diffs from 12-20220206 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: GSoC: Working on the static analyzer

2022-02-13 Thread David Malcolm via Gcc
On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> Hi,
> 
> I wanted some clarification on bifurcating the exploded graph at call
> to
> open().
> Should the analyzer warn for code like this "when open fails" (like
> strchr
> does when  'strchr' returns NULL)
> 
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
> 
> because of the bad file descriptor.
> unless it is written like this:
> if (!errno)
>   write(fd, "a", 1);

"open" returns a non-negative integer on success, and returns -1 and
sets errno on an error.  As I understand it, errno is never cleared
unless an error occurs, so errno could already be set due to a prior
failure.  Hence I believe the correct way to write such code is to
check fd for being -1, rather than checking errno.  

It's probably best to bifurcate the state at the open call, into
"success" and "failure" outcomes, so that we can track that the caller
"owns" the filedescriptor on success, and thus we can complain about fd
leaks on paths that discard the value without closing it.

We can then warn if we see -1 passed as the fd value into "write",
since that implies that the open call failed.  If we bifurcate the
state at the open call, then for:

int fd = open("NOFILE", O_RDONLY);
write(fd, "a", 1);

we'd have the execution paths:

(a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
setting sm-state on that value)

(b) "when 'open' fails" (setting fd to -1, and errno to a
conjured_svalue known to be nonzero)

and at the call to "write" on path (b) we see the -1 passed to it, and
can complain accordingly. 

I'm not sure if we also want to track that the file was O_RDONLY (which
suggests that the "write" should fail), but that would be a nice bonus.

Hope this is helpful
Dave



> 
> Thank you.
> 
> On Tue, Feb 1, 2022 at 8:28 PM Mir Immad  wrote:
> 
> > I worked around the leak detection and also handled the case where
> > the fd
> > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > attaching
> > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > move to
> > v12 next thing.
> > 
> > > I wonder if it's worth checking for attempts to write to a fd that
> > > was
> > > opened with O_RDONLY, or the converse?
> > 
> > Yes, it does not make sense to warn for write on read-only closed fd
> > or
> > converse. But, it does make sense to warn for write operation on
> > valid
> > read-only fd. For this, we might have to analyze calls to the open()
> > carefully -- get the literal value of the second argument to the call
> > and
> > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are
> > in it.
> > We might have to maintain separate vectors for these and check for
> > them in
> > calls to write and read -- warn if write is being called on read-only
> > fd
> > and converse. How do we equate them? Does gimple store unique ids for
> > each
> > tree? or something like SAME_TREE? Just thinking out loud.
> > 
> > > how much state does it make sense to track for a fd?
> > Sorry, I didnt get it. Do you mean how may states I'm using?
> > unchecked,
> > closed, null and leak_by_not_saved (for the call-tree that does not
> > have a
> > lhs).
> > 
> > > Also, at some point, we're going to have to handle "errno" - but
> > > given
> > > that might be somewhat fiddly it's OK to defer that until you're
> > > more
> > > familiar with the code
> > 
> > I'm looking forward to figure it out.
> > 
> > Thank you.
> > 
> > 
> > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm 
> > wrote:
> > 
> > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > Thank you for the detailed information.
> > > > 
> > > > I've been looking into the integer posix  file descriptor APIs
> > > > and I
> > > > decided to write proof-of-concept  checker for them. (not caring
> > > > about
> > > > errno). The checker tracks the fd returned by open(), warns if
> > > > dup()
> > > > is
> > > > called with closed fd otherwise tracks the fd returned by dup(),
> > > > it
> > > > also
> > > > warns if read() and write() functions were called on closed fd.
> > > > I'm
> > > > attaching a text file that lists some c sources and warnings by
> > > > the
> > > > static
> > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > this
> > > > something that could also be added to the analyzer?
> > > 
> > > This looks great, and very promising as both new functionality for
> > > GCC
> > > 13, and as a GSoC 2022 project.
> > > 
> > > BTW, it looks like you're working with GCC 11, but the analyzer has
> > > changed quite a bit on trunk for GCC 12, so it's worth trying to
> > > track
> > > trunk.
> > > 
> > > I wonder if it's worth checking for attempts to write to a fd that
> > > was
> > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > thinking
> > > aloud - how much state does it make sense to track for a fd?).
> > > 
> > > Also, at some point, we're going to have to handle "errno" - but
> > > given
> > > that might be somewhat fidd

Re: GSoC: Working on the static analyzer

2022-02-13 Thread David Malcolm via Gcc
On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote:
> On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> > Hi,
> > 
> > I wanted some clarification on bifurcating the exploded graph at
> > call
> > to
> > open().
> > Should the analyzer warn for code like this "when open fails" (like
> > strchr
> > does when  'strchr' returns NULL)
> > 
> > int fd = open("NOFILE", O_RDONLY);
> > write(fd, "a", 1);
> > 
> > because of the bad file descriptor.
> > unless it is written like this:
> > if (!errno)
> >   write(fd, "a", 1);
> 
> "open" returns a non-negative integer on success, and returns -1 and
> sets errno on an error.  As I understand it, errno is never cleared

"modified" would be a better word than "cleared" here, sorry

Dave

> unless an error occurs, so errno could already be set due to a prior
> failure.  Hence I believe the correct way to write such code is to
> check fd for being -1, rather than checking errno.  
> 
> It's probably best to bifurcate the state at the open call, into
> "success" and "failure" outcomes, so that we can track that the
> caller
> "owns" the filedescriptor on success, and thus we can complain about
> fd
> leaks on paths that discard the value without closing it.
> 
> We can then warn if we see -1 passed as the fd value into "write",
> since that implies that the open call failed.  If we bifurcate the
> state at the open call, then for:
> 
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
> 
> we'd have the execution paths:
> 
> (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
> setting sm-state on that value)
> 
> (b) "when 'open' fails" (setting fd to -1, and errno to a
> conjured_svalue known to be nonzero)
> 
> and at the call to "write" on path (b) we see the -1 passed to it,
> and
> can complain accordingly. 
> 
> I'm not sure if we also want to track that the file was O_RDONLY
> (which
> suggests that the "write" should fail), but that would be a nice
> bonus.
> 
> Hope this is helpful
> Dave
> 
> 
> 
> > 
> > Thank you.
> > 
> > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad 
> > wrote:
> > 
> > > I worked around the leak detection and also handled the case
> > > where
> > > the fd
> > > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > > attaching
> > > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > > move to
> > > v12 next thing.
> > > 
> > > > I wonder if it's worth checking for attempts to write to a fd
> > > > that
> > > > was
> > > > opened with O_RDONLY, or the converse?
> > > 
> > > Yes, it does not make sense to warn for write on read-only closed
> > > fd
> > > or
> > > converse. But, it does make sense to warn for write operation on
> > > valid
> > > read-only fd. For this, we might have to analyze calls to the
> > > open()
> > > carefully -- get the literal value of the second argument to the
> > > call
> > > and
> > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY
> > > are
> > > in it.
> > > We might have to maintain separate vectors for these and check
> > > for
> > > them in
> > > calls to write and read -- warn if write is being called on read-
> > > only
> > > fd
> > > and converse. How do we equate them? Does gimple store unique ids
> > > for
> > > each
> > > tree? or something like SAME_TREE? Just thinking out loud.
> > > 
> > > > how much state does it make sense to track for a fd?
> > > Sorry, I didnt get it. Do you mean how may states I'm using?
> > > unchecked,
> > > closed, null and leak_by_not_saved (for the call-tree that does
> > > not
> > > have a
> > > lhs).
> > > 
> > > > Also, at some point, we're going to have to handle "errno" -
> > > > but
> > > > given
> > > > that might be somewhat fiddly it's OK to defer that until
> > > > you're
> > > > more
> > > > familiar with the code
> > > 
> > > I'm looking forward to figure it out.
> > > 
> > > Thank you.
> > > 
> > > 
> > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <
> > > dmalc...@redhat.com>
> > > wrote:
> > > 
> > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > > Thank you for the detailed information.
> > > > > 
> > > > > I've been looking into the integer posix  file descriptor
> > > > > APIs
> > > > > and I
> > > > > decided to write proof-of-concept  checker for them. (not
> > > > > caring
> > > > > about
> > > > > errno). The checker tracks the fd returned by open(), warns
> > > > > if
> > > > > dup()
> > > > > is
> > > > > called with closed fd otherwise tracks the fd returned by
> > > > > dup(),
> > > > > it
> > > > > also
> > > > > warns if read() and write() functions were called on closed
> > > > > fd.
> > > > > I'm
> > > > > attaching a text file that lists some c sources and warnings
> > > > > by
> > > > > the
> > > > > static
> > > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > > this
> > > > > something that could also be added to the analyzer?
> > > > 
> > > > This looks great, and very promising as both new functionality
> > > > for
> >