On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote: > On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote: > > Hi, > > > > I wanted some clarification on bifurcating the exploded graph at > > call > > to > > open(). > > Should the analyzer warn for code like this "when open fails" (like > > strchr > > does when 'strchr' returns NULL) > > > > int fd = open("NOFILE", O_RDONLY); > > write(fd, "a", 1); > > > > because of the bad file descriptor. > > unless it is written like this: > > if (!errno) > > write(fd, "a", 1); > > "open" returns a non-negative integer on success, and returns -1 and > sets errno on an error. As I understand it, errno is never cleared
"modified" would be a better word than "cleared" here, sorry Dave > unless an error occurs, so errno could already be set due to a prior > failure. Hence I believe the correct way to write such code is to > check fd for being -1, rather than checking errno. > > It's probably best to bifurcate the state at the open call, into > "success" and "failure" outcomes, so that we can track that the > caller > "owns" the filedescriptor on success, and thus we can complain about > fd > leaks on paths that discard the value without closing it. > > We can then warn if we see -1 passed as the fd value into "write", > since that implies that the open call failed. If we bifurcate the > state at the open call, then for: > > int fd = open("NOFILE", O_RDONLY); > write(fd, "a", 1); > > we'd have the execution paths: > > (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and > setting sm-state on that value) > > (b) "when 'open' fails" (setting fd to -1, and errno to a > conjured_svalue known to be nonzero) > > and at the call to "write" on path (b) we see the -1 passed to it, > and > can complain accordingly. > > I'm not sure if we also want to track that the file was O_RDONLY > (which > suggests that the "write" should fail), but that would be a nice > bonus. > > Hope this is helpful > Dave > > > > > > > Thank you. > > > > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan...@gmail.com> > > wrote: > > > > > I worked around the leak detection and also handled the case > > > where > > > the fd > > > is not saved. I wonder why sm-file hasn't implemented it yet. I'm > > > attaching > > > a text file with analyzer warnings. I'm still on gcc-11.2.0, will > > > move to > > > v12 next thing. > > > > > > > I wonder if it's worth checking for attempts to write to a fd > > > > that > > > > was > > > > opened with O_RDONLY, or the converse? > > > > > > Yes, it does not make sense to warn for write on read-only closed > > > fd > > > or > > > converse. But, it does make sense to warn for write operation on > > > valid > > > read-only fd. For this, we might have to analyze calls to the > > > open() > > > carefully -- get the literal value of the second argument to the > > > call > > > and > > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY > > > are > > > in it. > > > We might have to maintain separate vectors for these and check > > > for > > > them in > > > calls to write and read -- warn if write is being called on read- > > > only > > > fd > > > and converse. How do we equate them? Does gimple store unique ids > > > for > > > each > > > tree? or something like SAME_TREE? Just thinking out loud. > > > > > > > how much state does it make sense to track for a fd? > > > Sorry, I didnt get it. Do you mean how may states I'm using? > > > unchecked, > > > closed, null and leak_by_not_saved (for the call-tree that does > > > not > > > have a > > > lhs). > > > > > > > Also, at some point, we're going to have to handle "errno" - > > > > but > > > > given > > > > that might be somewhat fiddly it's OK to defer that until > > > > you're > > > > more > > > > familiar with the code > > > > > > I'm looking forward to figure it out. > > > > > > Thank you. > > > > > > > > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm < > > > dmalc...@redhat.com> > > > wrote: > > > > > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote: > > > > > Thank you for the detailed information. > > > > > > > > > > I've been looking into the integer posix file descriptor > > > > > APIs > > > > > and I > > > > > decided to write proof-of-concept checker for them. (not > > > > > caring > > > > > about > > > > > errno). The checker tracks the fd returned by open(), warns > > > > > if > > > > > dup() > > > > > is > > > > > called with closed fd otherwise tracks the fd returned by > > > > > dup(), > > > > > it > > > > > also > > > > > warns if read() and write() functions were called on closed > > > > > fd. > > > > > I'm > > > > > attaching a text file that lists some c sources and warnings > > > > > by > > > > > the > > > > > static > > > > > analyzer. I've used the diagnostic meta-data from sm-file. Is > > > > > this > > > > > something that could also be added to the analyzer? > > > > > > > > This looks great, and very promising as both new functionality > > > > for > > > > GCC > > > > 13, and as a GSoC 2022 project. > > > > > > > > BTW, it looks like you're working with GCC 11, but the analyzer > > > > has > > > > changed quite a bit on trunk for GCC 12, so it's worth trying > > > > to > > > > track > > > > trunk. > > > > > > > > I wonder if it's worth checking for attempts to write to a fd > > > > that > > > > was > > > > opened with O_RDONLY, or the converse? (I'm not sure, just > > > > thinking > > > > aloud - how much state does it make sense to track for a fd?). > > > > > > > > Also, at some point, we're going to have to handle "errno" - > > > > but > > > > given > > > > that might be somewhat fiddly it's OK to defer that until > > > > you're > > > > more > > > > familiar with the code. > > > > > > > > > > > > > > About the fd leak, that's the next thing I'll try to get > > > > > working. > > > > > Since > > > > > you've mentioned that it could be a GSoC project, this is > > > > > what > > > > > I'm > > > > > going to > > > > > focus on. > > > > > > > > Excellent. > > > > > > > > Let me know (via this mailing list) if you have any questions. > > > > > > > > Thanks > > > > Dave > > > > > > > > > > > > > > Regards. > > > > > > > > > > > > > > > > > > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm < > > > > > dmalc...@redhat.com> > > > > > wrote: > > > > > > > > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote: > > > > > > > Hi, sir. > > > > > > > > > > > > > > I've been trying to understand the static analyzer's > > > > > > > code. I > > > > > > > spent most > > > > > > > of > > > > > > > my time learning the state machine's API. I learned how > > > > > > > state > > > > > > > machine's > > > > > > > on_stmt is supposed to "recognize" specific functions and > > > > > > > how > > > > > > > on_transition > > > > > > > takes a specific tree from one state to another, and how > > > > > > > the > > > > > > > captured > > > > > > > states are used by pending_diagnostics to report the > > > > > > > errors. > > > > > > > Furthermore, I > > > > > > > was able to create a dummy checker that mimicked the > > > > > > > behaviour of > > > > > > > sm- > > > > > > > file's > > > > > > > double_fclose and compile GCC with these changes. Is this > > > > > > > the > > > > > > > right way > > > > > > > of > > > > > > > learning? > > > > > > > > > > > > This sounds great. > > > > > > > > > > > > > > > > > > > > As you've mentioned on the projects page that you would > > > > > > > like > > > > > > > to > > > > > > > add > > > > > > > more > > > > > > > support for some POSIX APIs. Can you please write (or > > > > > > > refer > > > > > > > me to > > > > > > > a) a > > > > > > > simple C program that uses such an API (and also what the > > > > > > > analyzer > > > > > > > should > > > > > > > have done) so that I can attempt to add such a checker to > > > > > > > the > > > > > > > analyzer. > > > > > > > > > > > > A couple of project ideas: > > > > > > > > > > > > (i) treat data coming from a network connection as tainted, > > > > > > by > > > > > > somehow > > > > > > teaching the analyzer about networking APIs. Ideally: look > > > > > > at > > > > > > some > > > > > > subset of historical CVEs involving network-facing attacks > > > > > > on > > > > > > user- > > > > > > space daemons, and find a way to detect them in the > > > > > > analyzer > > > > > > (need > > > > > > to > > > > > > find a way to mark the incoming data as tainted, so that > > > > > > the > > > > > > analyer > > > > > > "know" about the trust boundary - that the incoming data > > > > > > needs > > > > > > to > > > > > > be > > > > > > sanitized and treated with extra caution; see > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html > > > > > > for my attempts to do this for the Linux kernel). > > > > > > > > > > > > Obviously this is potentially a huge project, so maybe just > > > > > > picking > > > > > > a > > > > > > tiny subset and getting that working as a proof-of-concept > > > > > > would be > > > > > > a > > > > > > good GSoC project. Maybe find an old CVE that someone has > > > > > > written > > > > > > a > > > > > > good write-up for, and think about "how could GCC/- > > > > > > fanalyzer > > > > > > have > > > > > > spotted it?" > > > > > > > > > > > > (ii) add leak-detection for POSIX file descriptors: i.e. > > > > > > the > > > > > > integer > > > > > > values returned by "open", "dup", etc. It would be good to > > > > > > have a > > > > > > check that the user's code doesn't leak these values e.g. > > > > > > on > > > > > > error- > > > > > > handling paths, by failing to close a file-descriptor (and > > > > > > not > > > > > > storing > > > > > > it anywhere). I think that much of this could be done by > > > > > > analogy > > > > > > with > > > > > > the sm-file.cc code. > > > > > > > > > > > > > > > > > > > > > > > > > > Also, I didn't realize the complexity of adding SARIF > > > > > > > when I > > > > > > > mentioned > > > > > > > it. > > > > > > > I'd rather work on adding more checkers. > > > > > > > > > > > > Fair enough. > > > > > > > > > > > > Hope this above is constructive. > > > > > > > > > > > > Dave > > > > > > > > > > > > > > > > > > > > Regards. > > > > > > > > > > > > > > Mir Immad > > > > > > > > > > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad < > > > > > > > mirimnan...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Sir, > > > > > > > > > > > > > > > > I've been trying to understand the static analyzer's > > > > > > > > code. > > > > > > > > I > > > > > > > > spent > > > > > > > > most of > > > > > > > > my time learning the state machine's API. I learned how > > > > > > > > state > > > > > > > > machine's > > > > > > > > on_stmt is supposed to "recognize" specific functions > > > > > > > > and > > > > > > > > takes > > > > > > > > a > > > > > > > > specific > > > > > > > > tree from one state to another, and how the captured > > > > > > > > states > > > > > > > > are > > > > > > > > used > > > > > > > > by > > > > > > > > pending_diagnostics to report the errors. Furthermore, > > > > > > > > I > > > > > > > > was > > > > > > > > able to > > > > > > > > create > > > > > > > > a dummy checker that mimicked the behaviour of sm- > > > > > > > > file's > > > > > > > > double_fclose and > > > > > > > > compile GCC with these changes. Is this the right way > > > > > > > > of > > > > > > > > learning? > > > > > > > > > > > > > > > > As you've mentioned on the projects page that you would > > > > > > > > like to > > > > > > > > add > > > > > > > > more > > > > > > > > support for some POSIX APIs. Can you please write (or > > > > > > > > refer > > > > > > > > me > > > > > > > > to a) > > > > > > > > a > > > > > > > > simple C program that uses such an API (and also what > > > > > > > > the > > > > > > > > analyzer > > > > > > > > should > > > > > > > > have done) so that I can attempt to add such a checker > > > > > > > > to > > > > > > > > the > > > > > > > > analyzer. > > > > > > > > > > > > > > > > Also, I didn't realize the complexity of adding SARIF > > > > > > > > when > > > > > > > > I > > > > > > > > mentioned it. > > > > > > > > I'd rather work on adding more checkers. > > > > > > > > > > > > > > > > Regards. > > > > > > > > Mir Immad > > > > > > > > > > > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm < > > > > > > > > dmalc...@redhat.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote: > > > > > > > > > > HI David, > > > > > > > > > > I've been tinkering with the static analyzer for > > > > > > > > > > the > > > > > > > > > > last > > > > > > > > > > few > > > > > > > > > > days. I > > > > > > > > > > find > > > > > > > > > > the project of adding SARIF output to the analyzer > > > > > > > > > > intresting. > > > > > > > > > > I'm > > > > > > > > > > writing > > > > > > > > > > this to let you know that I'm trying to learn the > > > > > > > > > > codebase. > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > > > Excellent. > > > > > > > > > > > > > > > > > > BTW, I think adding SARIF output would involve > > > > > > > > > working > > > > > > > > > more > > > > > > > > > with > > > > > > > > > GCC's > > > > > > > > > diagnostics subsystem than with the static analyzer, > > > > > > > > > since > > > > > > > > > (in > > > > > > > > > theory) > > > > > > > > > all of the static analyzer's output is passing > > > > > > > > > through > > > > > > > > > the > > > > > > > > > diagnostics > > > > > > > > > subsystem - though the static analyzer is probably > > > > > > > > > the > > > > > > > > > only > > > > > > > > > GCC > > > > > > > > > component generating diagnostic paths. > > > > > > > > > > > > > > > > > > I'm happy to mentor such a project as I maintain both > > > > > > > > > subsystems > > > > > > > > > and > > > > > > > > > SARIF output would benefit both - but it would be > > > > > > > > > rather > > > > > > > > > tangential > > > > > > > > > to > > > > > > > > > the analyzer - so if you had specifically wanted to > > > > > > > > > be > > > > > > > > > working on > > > > > > > > > the > > > > > > > > > guts of the analyzer itself, you may want to pick a > > > > > > > > > different > > > > > > > > > subproject. > > > > > > > > > > > > > > > > > > The SARIF standard is rather long and complicated, > > > > > > > > > and we > > > > > > > > > would > > > > > > > > > want to > > > > > > > > > be compatible with clang's implementation. > > > > > > > > > > > > > > > > > > It would be very cool if gcc could also accept SARIF > > > > > > > > > files as > > > > > > > > > an > > > > > > > > > *input* format, and emit them as diagnostics; that > > > > > > > > > might > > > > > > > > > help > > > > > > > > > with > > > > > > > > > debugging SARIF output. (I have a old patch for > > > > > > > > > adding > > > > > > > > > JSON > > > > > > > > > parsing > > > > > > > > > support to GCC that could be used as a starting point > > > > > > > > > for > > > > > > > > > this). > > > > > > > > > > > > > > > > > > Hope the above makes sense > > > > > > > > > Dave > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm < > > > > > > > > > > dmalc...@redhat.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via > > > > > > > > > > > Gcc > > > > > > > > > > > wrote: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > Hi, and welcome. > > > > > > > > > > > > > > > > > > > > > > > I intend to work on the static analyzer. Are > > > > > > > > > > > > these > > > > > > > > > > > > documents > > > > > > > > > > > > enough to > > > > > > > > > > > > get > > > > > > > > > > > > started: > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals > > > > > > > > > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > > > > > > > > There are also some high-level notes here: > > > > > > > > > > > > > > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer > > > > > > > > > > > > > > > > > > > > > > Also, given that the analyzer is part of GCC, the > > > > > > > > > > > more > > > > > > > > > > > general > > > > > > > > > > > introductions to hacking on GCC will be useful. > > > > > > > > > > > > > > > > > > > > > > I recommend creating a trivial C source file with > > > > > > > > > > > a > > > > > > > > > > > bug > > > > > > > > > > > in it > > > > > > > > > > > (e.g. > > > > > > > > > > > a > > > > > > > > > > > 3-line function with a use-after-free), and > > > > > > > > > > > stepping > > > > > > > > > > > through > > > > > > > > > > > the > > > > > > > > > > > analyzer to get a sense of how it works. > > > > > > > > > > > > > > > > > > > > > > Hope this is helpful; don't hesitate to ask > > > > > > > > > > > questions. > > > > > > > > > > > Dave > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >