Re: GSoC 2021 - Static analyzer project

David Malcolm via Gcc Tue, 30 Mar 2021 06:57:17 -0700

On Tue, 2021-03-30 at 16:36 +0530, Ankur Saini wrote:
> hello sir 
> 
> in my quest of finding a bug ( which ended up being a feature ) along
> with it’s source in the analyzer, I tested the code on these 2 code
> snippets and here’s how I went towards it 
> 
> (1)
> int main()
> {
>     int *ptr = (int *)malloc(sizeof(int));
>     return 0;
> }
> 
> link to running example (https://godbolt.org/z/1jGW1qYez <
> https://godbolt.org/z/1jGW1qYez>)
> 
> (2)
> int definaltly_main()
> {
>     int *ptr = (int *)malloc(sizeof(int));
>     return 0;
> }
> 
> link to running example (https://godbolt.org/z/bzjMYex4M <
> https://godbolt.org/z/bzjMYex4M>)
> 
> 
> where on second snipper analyzer is warning us about the leak as it
> should be, but in the first one it isn’t. 
> 
> and as the gimple representation of both looks exactly the same apart
> from function name, which made me think that either intentionally or
> unintentionally, analyzer handles case of main() differently than any
> other function.


Correct - the analyzer special-cases "main".

Specifically, in impl_region_model_context::on_state_leak, there's this
code:
  
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/engine.cc;h=d7866b5598b4fcb791ec6ff511dde9b7615e7794;hb=HEAD#l624

 624   /* Don't complain about leaks when returning from "main".  */
 625   if (m_enode_for_diag->get_supernode ()
 626       && m_enode_for_diag->get_supernode ()->return_p ())
 627     {
 628       tree fndecl = m_enode_for_diag->get_function ()->decl;
 629       if (id_equal (DECL_NAME (fndecl), "main"))
 630         {
 631           if (logger)
 632             logger->log ("not reporting leak from main");
 633           return;
 634         }
 635     }

on the grounds that for the resources that the analyzer currently
tracks, it doesn't matter if they aren't cleaned up as the process
exits, so we don't bother the user with a report about them.

> so while looking at it’s exploded graphs I found out that the only 2
> differences in them 
> 
> 1. There is one less exploded node(after node E-8) created in first
> one ( I earlier thought state merging or state pruning is taking
> place here but it isn’t because the results are not affected even
> after disabling those using  `-fno-analyzer-state-purge` and `-fno-
> analyzer-state-merge` options )

Well spotted.

I see that too.

For (1) EN 8, has 2 stmts, the label and the return, but for (2), it
splits them with EN: 8 having just the label, and the return split into
EN 9.

I was surprised by this and did some digging in gdb, for both (1) and
(2).  The reason seems to be rather arbitrary; specifically in (2),
stmt_requires_new_enode_p is returning true due to hitting this case:

2894      /* If we had a PREV_STMT with an unknown location, and this stmt
2895         has a known location, then if a state change happens here, it
2896         could be consolidated into PREV_STMT, giving us an event with
2897         no location.  Ensure that STMT gets its own exploded_node to
2898         avoid this.  */
2899      if (get_pure_location (prev_stmt->location) == UNKNOWN_LOCATION
2900          && get_pure_location (stmt->location) != UNKNOWN_LOCATION)
2901        return true;

and presumably it isn't hitting this for (1).

Specifically, where "stmt" is the return stmt and "prev_stmt" is the
label statement.

for (1):

(gdb) p /x stmt->location
$1 = 0x0
(gdb) p /x prev_stmt->location
$2 = 0x0

whereas for (2):

(gdb) p /x stmt->location
$5 = 0x800000f2
(gdb) p /x prev_stmt->location
$6 = 0x0

So with (1) the return stmt has UNKNOWN_LOCATION:

(gdb) call inform (stmt->location, "return stmt in (1)")
In function ‘main’:
cc1: note: return stmt in (1)

whereas for (2) the return stmt has a source location:

(gdb) call inform (stmt->location, "return stmt in (2)")
In function ‘definaltly_main’:
/tmp/2.c:6:12: note: return stmt in (2)
    6 |     return 0;
      |            ^

This slight difference in the recorded location of the return stmt for
the "main" vs non-"main" case affects the splitting of the nodes.

Athough a curiosity, I don't think this is significant.  (In theory one
could use a hardware watchpoint on stmt->location to track this
discrepancy down further, but I don't think it's important enough to
bother).


> 2. no diagnosis for malloc leak happening at the end of first one
> even though there exist a pointer in unchecked state at the end (
> according to the malloc state machine )

Correct - the analyzer specialcases "main" and ignores it, as noted
above.


> In quest to find the cause I started navigating through the source
> code of the analyser starting off with the run_checkers() function in
> engine.cc which looks like the entry point of the analyser ( found
> via the commit history of the analyzer ). But finally it ended at
> `impl_region_model_context::on_state_leak()` function where I found
> out that analyzer is intentionally skipping the leak report when it
> is in main. 

Sounds like you successfully tracked this down.

> This gave rise to some questions
> 
> 1. why does the analyzer make exceptions with the main() function ?

The user's attention is important - we don't want to spam the user with
unnecessary reports if we can help it.

> 2. even if it is not complaining about the leak then this still
> doesn’t explain the reason for one less exploded node in this case of
> main() function.

As above, it seems to be a side-effect of differences in source
locations.

> thanks
> 
> - Ankur

Hope this is helpful
Dave

Re: GSoC 2021 - Static analyzer project

Reply via email to