Re: daily report on extending static analyzer project [GSoC]

Ankur Saini via Gcc Wed, 21 Jul 2021 09:15:33 -0700

> On 17-Jul-2021, at 2:57 AM, David Malcolm <[email protected]> wrote:
> 
> On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
>> 
>> 
>>> On 15-Jul-2021, at 4:53 AM, David Malcolm <[email protected]>
>>> wrote:
>>> 
>>> On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
>>> 
>>> 
> 
> [...snip...]
> 
>>> 
>>>> 
>>>>   2. ( pr100546.c <   
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 < 
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
>>>>   ```
>>>>   #include <stdio.h>
>>>>   #include <cstdlib.h>
>>>>   
>>>>   static void noReturn(const char *str) __attribute__((noreturn));
>>>>   static void noReturn(const char *str) {
>>>>       printf("%s\n", str);
>>>>       exit(1);
>>>>   }
>>>>   
>>>>   void (*noReturnPtr)(const char *str) = &noReturn;
>>>>   
>>>>   int main(int argc, char **argv) {
>>>>       char *str = 0;
>>>>       if (!str)
>>>>           noReturnPtr(__FILE__);
>>>>       return printf("%c\n", *str);
>>>>   }
>>>>   ```
>>>>   (godbolt link <https://godbolt.org/z/aWfW51se3 < 
>>>> https://godbolt.org/z/aWfW51se3>>)
>>>> 
>>>> - But at the time of testing ( command used 
>>>>   was `make check-gcc RUNTESTFLAGS="-v -v
>>>> analyzer.exp=pr100546.c"`),
>>>> both of 
>>>>   them failed unexpectedly with Segmentation fault at the call
>>>> 
>>>> - From further inspection, I found out that this is due 
>>>>   "-fanalyzer-call-summaries" option, which looks like activats
>>>> call
>>>> summaries
>>>> 
>>>> - I would look into this in more details ( with gdb ) tomorrow,
>>>> right
>>>> now 
>>>>   my guess is that this is either due too the changes I did in
>>>> state-
>>>> purge.cc <http://purge.cc/>
>>>>   or is a call-summary related problem ( I remember it not being 
>>>>   perfetly implemented right now). 
>>> 
>>> I'm not proud of the call summary code, so that may well be the
>>> problem.
>>> 
>>> Are you able to use gdb on the analyzer?  It ought to be fairly
>>> painless to identify where a segfault is happening, so let me know if
>>> you're running into any problems with that.
>> 
>> Yes, I used gdb on the analyzer to go into details and looks like I was
>> correct, the program was crashing in “analysis_plan::use_summary_p ()”
>> on line 114 ( const cgraph_node *callee = edge->callee; ) where program
>> was trying to access callgraph edge which didn’t exist .
>> 
>> I fixed it by simply making analyzer abort using call summaries in
>> absence of callgraph edge.
>> 
>> File: {src-dir}/gcc/analyzer/analysis-plan.cc
>> 
>> 105: bool
>> 106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
>> 107: {
>> 108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
>> 109:   if (!flag_analyzer_call_summaries)
>> 110:     return false;
>> 111: 
>> 112:   /* Don't use call summaries if there is no callgraph edge */
>> 113:   if(!edge || !edge->callee)
>> 114:     return false;
>> 
>> and now the tests are passing successfully. ( both manually and via
>> DejaGnu ).
> 
> Great.
> 
>> 
>> I have attached a sample patch of work done till now with this mail for
>> review ( I haven’t sent this one to the patches list as it’s change log
>> was not complete for now ).
>> 
>> P.S. I have also sent another mail (   
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <  
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to
>> patches list with the previous call-string patch and this time it
>> popped up in my inbox as it should, did you also received it now ?
> 
> I can see it in the archive URL, but for some reason it's not showing
> up in my inbox.  Bother.  Please can you try resending it directly to
> me?

Ok, I have sent the call-string patch directly to you. I have actually sent 2 
mails ( from different mail ids ) to check if it’s the id which is causing the 
issue or the contents of the mail itself.

> 
> Putting email issues to one side, the patch you linked to above looks
> good.  To what extent has it been tested?  If it bootstraps and passes
> the test suite, it's ready for trunk.

It bootstrapped successfully on a couple of the x86_64 machines ( on gcc farm ) 
And regress testing is underway.

> 
> Note that over the last couple of days I pushed my "use of
> uninitialized values" detection work to trunk (aka master), along with
> various other changes, so it's worth pulling master and rebasing on top
> of that before testing.  I *think* we've been touching different parts
> of the analyzer code, but there's a chance you might have to resolve
> some merge conflicts.

I have been constantly updating my branch as soon as I see a commit on analyzer 
( on patches list ) to avoid any conflicts. 
Till now I haven’t encountered any.

> 
> As for the patch you attached to this email
>  "[PATCH] analyzer: make analyer detect calls via function pointers"
> here's an initial review:

thanks for the review

> 
>> diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
>> index 7dfc48e9c3e..1c7e4d2cc84 100644
>> --- a/gcc/analyzer/analysis-plan.cc
>> +++ b/gcc/analyzer/analysis-plan.cc
>> @@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) 
>> const
>>   if (!flag_analyzer_call_summaries)
>>     return false;
>> 
>> +  /* Don't use call summaries if there is no callgraph edge */
>> +  if(!edge || !edge->callee)
>> +    return false;
> 
> Is it possible for a cgraph_edge to have a NULL callee?  (I don't think
> so, but I could be wrong)

Maybe not, I did it because in function supergraph_call_edge (), there was a 
similar case where it was pointed out that this might be the case when call is 
happening via function pointers.

File: {source_dir}/gcc/analyzer/supergraph.cc
72: /* Get the cgraph_edge, but only if there's an underlying function body.  */
73: 
74: cgraph_edge *
75: supergraph_call_edge (function *fun, gimple *stmt)
76: {
77:   gcall *call = dyn_cast<gcall *> (stmt);
78:   if (!call)
79:     return NULL;
80:   cgraph_edge *edge = cgraph_node::get (fun->decl)->get_edge (stmt);
81:   if (!edge)
82:     return NULL;
83:   if (!edge->callee)
84:     return NULL; /* e.g. for a function pointer.  */
85:   if (!get_ultimate_function_for_cgraph_edge (edge))
86:     return NULL;
87:   return edge;
88: }

> 
> Nit: missing space between "if" and open-paren

Fixed

> 
>> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
>> index 7662a7f7bab..f45a614c0ab 100644
>> --- a/gcc/analyzer/engine.cc
>> +++ b/gcc/analyzer/engine.cc
>> @@ -3170,10 +3170,13 @@ exploded_graph::process_node (exploded_node *node)
>>       break;
>>     case PK_AFTER_SUPERNODE:
>>       {
>> +        bool found_a_superedge = false;
>> +        bool is_an_exit_block = false;
>>      /* If this is an EXIT BB, detect leaks, and potentially
>>         create a function summary.  */
>>      if (point.get_supernode ()->return_p ())
>>        {
>> +        is_an_exit_block = true;
>>          node->detect_leaks (*this);
>>          if (flag_analyzer_call_summaries
>>              && point.get_call_string ().empty_p ())
>> @@ -3201,6 +3204,7 @@ exploded_graph::process_node (exploded_node *node)
>>      superedge *succ;
>>      FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
>>        {
>> +        found_a_superedge = true;
>>          if (logger)
>>            logger->log ("considering SN: %i -> SN: %i",
>>                         succ->m_src->m_index, succ->m_dest->m_index);
>> @@ -3210,19 +3214,100 @@ exploded_graph::process_node (exploded_node *node)
>>                                               point.get_call_string ());
>>          program_state next_state (state);
>>          uncertainty_t uncertainty;
>> +
>> +        /* Check if now the analyzer know about the call via 
>> +               function pointer or not. */
>> +            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL && 
>> +                !(succ->get_any_callgraph_edge()))
> 
> Some formatting nits:
> - on a multiline conditional, our convention is to put the "&&" at the
> start of a line, rather than the end
> - missing space beween function name and open-paren in call.
> 
> Hence this should be formatted as:
> 
>            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
>               && !(succ->get_any_callgraph_edge ()))

Fixed

> 
> 
>> +              {    
>> +                const program_point *this_point = &node->get_point();
>> +                const program_state *this_state = &node->get_state ();
>> +                const gcall *call 
>> +                  = this_point->get_supernode ()->get_final_call ();
>> +
>> +                impl_region_model_context ctxt (*this,
>> +                  node, 
>> +                  this_state, 
>> +                  &next_state, 
>> +                  &uncertainty,
>> +                  this_point->get_stmt());
>> +
>> +                region_model *model = this_state->m_region_model;
>> +                tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
>> +                function *fun = DECL_STRUCT_FUNCTION(fn_decl);
>> +                if(fun)
>> +                {
>> +                  const supergraph *sg = &(this->get_supergraph());
> 
> Might as well turn this into
> 
>> +                  const supergraph &sg = get_supergraph ();
> 
> and update various "sg->" into "sg." below.

Done

> 
>> +                  supernode * sn_entry = sg->get_node_for_function_entry 
>> (fun);
>> +                  supernode * sn_exit = sg->get_node_for_function_exit 
>> (fun);
>> +
>> +                  program_point new_point 
>> +                    = program_point::before_supernode (sn_entry,
>> +                                                           NULL,
>> +                                                           
>> point.get_call_string ());
>> +
>> +                  new_point.push_to_call_stack (sn_exit,
>> +                                            next_point.get_supernode());
>> +
>> +                  next_state.push_call(*this, node, call, &uncertainty);
> 
> Some logging here could be helpful.

Will do

> 
> 
> [...]
> 
>> +
>> +    /* Return from the calls which doesn't have a return superedge.
>> +            Such case occurs when GCC's middle end didn't knew which 
>> function to
>> +            call but analyzer did */
>> +    if((is_an_exit_block && !found_a_superedge) && 
>> +       (!point.get_call_string().empty_p()))
> 
> Similar formatting nits as above; this should look something like:
> 
>    if ((is_an_exit_block && !found_a_superedge)
>         && !point.get_call_string ().empty_p ())

Fixed

> 
> 
> [...snip...]
> 
>> 
>> +/* do exatly what region_model::update_for_return_superedge() do
>> +   but get the call info from CALL_STMT instead from a suerpedge and 
>> +   is availabe publicically   */
>> +void
>> +region_model::update_for_return_gcall (const gcall *call_stmt,
>> +                                           region_model_context *ctxt)
>> +{
>> +  /* Get the region for the result of the call, within the caller frame.  */
>> +  const region *result_dst_reg = NULL;
>> +  tree lhs = gimple_call_lhs (call_stmt);
>> +  if (lhs)
>> +    {
>> +      /* Normally we access the top-level frame, which is:
>> +         path_var (expr, get_stack_depth () - 1)
>> +         whereas here we need the caller frame, hence "- 2" here.  */
>> +      gcc_assert (get_stack_depth () >= 2);
>> +      result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
>> +                               ctxt);
>> +    }
>> +
>> +  pop_frame (result_dst_reg, NULL, ctxt);
>> +}
> 
> I *think* this can be consolidated with update_for_return_superedge. 
> Maybe instead, just rename that to update_for_return, and update the
> callsite of update_for_return_superedge to get the call_stmt there.
> 
> That way we avoid copy&paste repetition of code.
> 
> I think something similar can be done for
> region_model::update_for_gcall and update_for_call_superedge.

Done

> 
> [...snip...]
> 
>> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
>> index 8611d0f8689..ace9e7b128a 100644
>> --- a/gcc/analyzer/supergraph.cc
>> +++ b/gcc/analyzer/supergraph.cc
>> @@ -183,11 +183,34 @@ supergraph::supergraph (logger *logger)
>>            m_stmt_to_node_t.put (stmt, node_for_stmts);
>>            m_stmt_uids.make_uid_unique (stmt);
>>            if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
>> -            {
>> -              m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
>> -              node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), 
>> NULL);
>> -              m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
>> -            }
>> +                    {
>> +                      m_cgraph_edge_to_caller_prev_node.put(edge, 
>> node_for_stmts);
>> +                      node_for_stmts = add_node (fun, bb, as_a <gcall *> 
>> (stmt),
>> +                                                 NULL);
>> +                      m_cgraph_edge_to_caller_next_node.put (edge, 
>> node_for_stmts);
>> +                    }
>> +            else
>> +            {
>> +              // maybe call is via a function pointer
>> +              gcall *call = dyn_cast<gcall *> (stmt);
>> +              if (call)
> 
> You can combine the if and the decl/dyn_cast into one line:
> 
>                 if (gcall *call = dyn_cast<gcall *> (stmt))
> 
> which saves a little vertical space and (slightly) limits the scope of
> "call”.

Done

> 
>> +              {
>> +                cgraph_edge *edge 
>> +                  = cgraph_node::get (fun->decl)->get_edge (stmt);
>> +                if (!edge || !edge->callee)
>> +                {
>> +                  supernode *old_node_for_stmts = node_for_stmts;
>> +                  node_for_stmts = add_node (fun, bb, call, NULL);
>> +
>> +                  superedge *sedge 
>> +                    = new callgraph_superedge (old_node_for_stmts,
>> +                                               node_for_stmts,
>> +                                               
>> SUPEREDGE_INTRAPROCEDURAL_CALL,
>> +                                               NULL);
>> +                  add_edge (sedge);
>> +                }
>> +              }
>> +            }
>>          }
> 
> [...snip...]
> 
>> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c 
>> b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
>> new file mode 100644
>> index 00000000000..1054ab4f240
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
>> @@ -0,0 +1,53 @@
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +
>> +void fun(int *int_ptr)
>> +{
>> +    free(int_ptr);  /* { dg-warning "double-'free' of 'int_ptr'" } */
>> +}
>> +
>> +void single_call()
>> +{
>> +    int *int_ptr = (int*)malloc(sizeof(int));
>> +    void (*fun_ptr)(int *) = &fun;
>> +    (*fun_ptr)(int_ptr);
>> +}
>> +
>> +void double_call()
>> +{
>> +    int *int_ptr = (int*)malloc(sizeof(int));
>> +    void (*fun_ptr)(int *) = &fun;
>> +    (*fun_ptr)(int_ptr);
>> +    (*fun_ptr)(int_ptr);
>> +}
>> +
>> +/*{ dg-begin-multiline-output "" }
> 
> Normally the test suite injects "-fdiagnostics-plain-output" into the
> options, which turns off the source code printing and ASCII art for
> interprocedural paths.  If you want to test them via dg-begin-
> multiline-output you'll need to have a look at the directives at the
> top of one of the existing tests that uses dg-begin-multiline-output.
> 
>> +    6 |         free(int_ptr);
>> +      |         ^~~~~~~~~~~~~
>> +  'double_call': events 1-2
>> +    |
>> +    |   16 | void double_call()
>> +    |      |      ^~~~~~~~~~~
>> +    |      |      |
>> +    |      |      (1) entry to 'double_call'
>> +    |   17 | {
>> +    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
>> +    |      |                              ~~~~~~~~~~~~~~~~~~~
>> +    |      |                              |
>> +    |      |                              (2) allocated here
>> +    |
>> +    +--> 'fun': events 3-6
>> +           |
>> +           |    4 | void fun(int *int_ptr)
>> +           |      |      ^~~
>> +           |      |      |
>> +           |      |      (3) entry to ‘fun’
>> +           |      |      (5) entry to ‘fun’
>> +           |    5 | {
>> +           |    6 |         free(int_ptr);
>> +           |      |         ~~~~~~~~~~~~~
>> +           |      |         |
>> +           |      |         (4) first 'free' here
>> +           |      |         (6) second 'free' here; first 'free' was at (4)
>> +           |
>> +*/
> 
> and normally there would be a terminating:
> 
>  { dg-end-multiline-output "" } */
> 
> In any case, the output above is missing some events: I think ideally
> it would show the calls from *fun_ptr to fun and the returns, giving
> something like the following (which I mocked up by hand):
> 
>  'double_call': events 1-3
>    |
>    |   16 | void double_call()
>    |      |      ^~~~~~~~~~~
>    |      |      |
>    |      |      (1) entry to 'double_call'
>    |   17 | {
>    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
>    |      |                              ~~~~~~~~~~~~~~~~~~~
>    |      |                              |
>    |      |                              (2) allocated here
>    | ...  |
>    |   19 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (3) calling 'fun' from 'double-call'
>    |
>    +--> 'fun': events 3-6
>           |
>           |    4 | void fun(int *int_ptr)
>           |      |      ^~~
>           |      |      |
>           |      |      (4) entry to ‘fun’
>           |    5 | {
>           |    6 |         free(int_ptr);
>           |      |         ~~~~~~~~~~~~~
>           |      |         |
>           |      |         (5) first 'free' here
>           |
>    <------+
>    |
>  'double_call': events 6-7
>    |
>    |   19 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (6) returning to 'double-call' from 'fun'
>    |   20 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (7) calling 'fun' from 'double-call'
>    |
>    +--> 'fun': events 8-9
>           |
>           |    4 | void fun(int *int_ptr)
>           |      |      ^~~
>           |      |      |
>           |      |      (8) entry to ‘fun’
>           |    5 | {
>           |    6 |         free(int_ptr);
>           |      |         ~~~~~~~~~~~~~
>           |      |         |
>           |      |         (9) second 'free' here; first 'free' was at (5)
> 
> The events are created in diagnostic-manager.cc
> 
> Am I right in thinking that there's a interprocedural superedge for the
> dynamically-discovered calls?

No there isn’t, such calls will only have an exploded edge and no 
interprocedural superedge

> 
> diagnostic_manager::add_events_for_superedge creates events for calls
> and returns, so maybe you just need to update the case
> SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
> "dynamically discovered edge" cases (compare it with the other cases in
> that function).   You might need to update the existing call_event and
> return_event subclasses slightly (see checker-path.cc/h)

As we already have exploded edges representing the call, my idea was to add 
event for such cases via custom edge info ( similar to what we have for longjmp 
case ) instead of creating a special case in 
diagnostic_manager::add_events_for_superedge (). 

> 
> Ideally, event (7) above would read
>  "passing freed pointer 'int_ptr' in call to 'fun from 'double_call"
> but that's advanced material :)
> 
> [...snip...]
> 
> Hope this makes sense and is constructive
> Dave

Thanks 
- Ankur
Re: daily report on extending static analyzer project [GSoC]

Reply via email to