On Tue, 2021-04-06 at 17:56 +0530, Ankur Saini wrote:

Hi Ankur.

Various replies inline below throughout.

> > On 30-Mar-2021, at 7:27 PM, David Malcolm <dmalc...@redhat.com>
> > wrote:
> 
> > > This gave rise to some questions
> > > 
> > > 1. why does the analyzer make exceptions with the main() function
> > > ?
> > 
> > The user's attention is important - we don't want to spam the user
> > with
> > unnecessary reports if we can help it.
> 
> make sense. 
> 
> ——
> 
> After fiddling around with a lot of C codes, I switched to C++
> programs  in-order to find how exactly the analyzer doesn’t
> understand exception handling and more interestingly calls to virtual
> functions ( which I am thinking to work on this summer ). 

Sounds like a good focus.

> It was comparatively harder to find such an example where it would
> fail as looks like gcc do amazingly nice job at devirtualising the
> function calls ( even with -O0 option ) but finally after a lot of
> attempts and reading online about devirtualisation, I found this
> particular example where the analyzer generates a false positive
> 
> #include <cstdlib>
> 
> struct A
> {
>     virtual int foo (void) 
>     {
>         return 42;
>     }
> };
> 
> struct B: public A
> {
>     int *ptr;
>     void alloc ()
>     {
>         ptr = (int*)malloc(sizeof(int));
>     }
>         int foo (void) 
>     { 
>         free(ptr);
>         return 0;
>     }
> };
> 
> int test()
> {
>     struct B b, *bptr=&b;
>     b.alloc();
>     bptr->foo();
>     return bptr->foo();
> }
> 
> int main()
> {
>     test();
> }
> 
> working link of the above code (https://godbolt.org/z/n17WK4MxG < 
> https://godbolt.org/z/n17WK4MxG>)
> 
> here as the analyzer doesn’t understand the call to virtual function,
> wasn’t able to locate a double free in the program which was found at
> runtime.

Good work.

> so I went through it’s exploded graph to see how exactly is this
> being processed. And from the looks of things the anayzer doesn’t
> understood the function call which according to me was the following
> part in gimple representation :
> 
> 1 = bptr_8->D.3795._vptr.A;
>  _2 = *_1;
> OBJ_TYPE_REF(_2;(struct B)bptr_8->0) (bptr_8)
> 
> after scanning the source-code a bit i found out that such calls were
> being processed by "region_model::handle_unrecognized_call()” where
> it just keeps track of reachable states from that node.

Again, good detective work.

Right - the analyzer "sees" the jump through a function pointer, and it
doesn't yet have any special knowledge about what that function pointer
might be.

Given that we know we have a B, we ought to be able to assume that B::B
initializes the vtable of the instance, and make assumptions about what
the values in that vtable are.  The analyzer doesn't have any of this
special-case knowledge yet - hence bug 97114.

> ——
> 
> Questions 
> 
> 1. The link to the bug tracker for vfunc() [   
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97114 <  
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97114> ] says that for
> vfuncs() to be understood by anayzer, it ought to be able to
> devirtualize calls, but is it possible to devirtualise all the calls
> ? what if it is random or depends on user input ?

It's not possible for the general case.  Consider that there could be
other subclasses of A that we don't about in this translation unit.

But for this case, -fdump-ipa-analyzer=stderr shows this gimple at the
start of "test":

  B::B (&b);
  bptr_8 = &b;
  B::alloc (&b);
  _1 = bptr_8->D.3290._vptr.A;
  _2 = *_1;
  OBJ_TYPE_REF(_2;(struct B)bptr_8->0) (bptr_8);

As noted above, I think that we can add enough logic to the analyzer to
so that it "knows" that B::B should stores a vtable ptr, and then when
that vtable is accessed, it should know what functions are being
pointed to.  I think it would mean adding a new region subclass in the
analyzer, where an instance represents the vtable for a given class in
the user's code.

For cases where we have a "base *" and don't know which subclass the
instance is, we could potentially have the analyzer speculate about the
subclasses it knows about, adding exploded edges to speculated calls
for the various subclasses.  I don't yet know if this is a good idea,
but it seems worth experimenting with.

> 2. Even though analyzer didn’t understood calls to virtual function,
> it didn’t gave warning about a memory leak either which according to
> it, should exist if the functions were never called to deallocate the
> pointer ( after exploded node 152 and 118, the state of malloc
> changes automatically ) ?

If the analyzer "sees" a call to an unknown function (either through a
unknown function pointer, or to a function it doesn't have the body
of), it acts conservatively by resetting all of the state-machine state
for the values that are reachable by the function call, to avoid false
leak reports.  Hence it resets the state of B's ptr from the
"unchecked" allocation state back to the "start" state.

> sorry if I am asking a lot of questions.

It's fine - part of the point of GSoC is to learn, and by asking these
questions on the mailing list you're getting me to explain things here,
where everyone else can see them too, which helps improve the "bus
factor" of the analyzer.

Also, you've clearly taken the time to dig into the code and create
examples, and you are asking good questions - thanks!

Hope the above makes sense and is helpful
Dave


Reply via email to