On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
> 
> 
> > On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalc...@redhat.com>
> > wrote:
> > 
> > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> > 
> > [...snip...]
> > > 
> > > - From observation, a typical vfunc call that isn't devirtualised
> > > by
> > > the compiler's front end looks something like this 
> > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> > > where "a_ptr_5(D)" is pointer that is being used to call the
> > > virtual
> > > function.
> > > 
> > > - We can access it's region to see what is the type of the object
> > > the
> > > pointer is actually pointing to.
> > > 
> > > - This is then used to find a call with DECL_CONTEXT of the object
> > > from the all the possible targets of that polymorphic call.
> > 
> > [...]
> > 
> > > 
> > > Patch file ( prototype ) : 
> > > 
> > 
> > > +  /* Call is possibly a polymorphic call.
> > > +  
> > > +     In such case, use devirtisation tools to find 
> > > +     possible callees of this function call.  */
> > > +  
> > > +  function *fun = get_current_function ();
> > > +  gcall *stmt  = const_cast<gcall *> (call);
> > > +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> > > +  if (e->indirect_info->polymorphic)
> > > +  {
> > > +    void *cache_token;
> > > +    bool final;
> > > +    vec <cgraph_node *> targets
> > > +      = possible_polymorphic_call_targets (e, &final,
> > > &cache_token, true);
> > > +    if (!targets.is_empty ())
> > > +      {
> > > +        tree most_propbable_taget = NULL_TREE;
> > > +        if(targets.length () == 1)
> > > +           return targets[0]->decl;
> > > +    
> > > +        /* From the current state, check which subclass the
> > > pointer that 
> > > +           is being used to this polymorphic call points to, and
> > > use to
> > > +           filter out correct function call.  */
> > > +        tree t_val = gimple_call_arg (call, 0);
> > 
> > Maybe rename to "this_expr"?
> > 
> > 
> > > +        const svalue *sval = get_rvalue (t_val, ctxt);
> > 
> > and "this_sval"?
> 
> ok
> 
> > 
> > ...assuming that that's what the value is.
> > 
> > Probably should reject the case where there are zero arguments.
> 
> Ideally it should always have one argument representing the pointer
> used to call the function. 
> 
> for example, if the function is called like this : -
> 
> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
> pointer to an object of a subclass.
> 
> I saw that it’s GIMPLE representation is as follows : -
> 
> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
> 
> > 
> > 
> > > +
> > > +        const region *reg
> > > +          = [&]()->const region *
> > > +              {
> > > +                switch (sval->get_kind ())
> > > +                  {
> > > +                    case SK_INITIAL:
> > > +                      {
> > > +                        const initial_svalue *initial_sval
> > > +                          = sval->dyn_cast_initial_svalue ();
> > > +                        return initial_sval->get_region ();
> > > +                      }
> > > +                      break;
> > > +                    case SK_REGION:
> > > +                      {
> > > +                        const region_svalue *region_sval 
> > > +                          = sval->dyn_cast_region_svalue ();
> > > +                        return region_sval->get_pointee ();
> > > +                      }
> > > +                      break;
> > > +
> > > +                    default:
> > > +                      return NULL;
> > > +                  }
> > > +              } ();
> > 
> > I think the above should probably be a subroutine.
> > 
> > That said, it's not clear to me what it's doing, or that this is
> > correct.
> 
> 
> Sorry, I think I should have explained it earlier.
> 
> Let's take an example code snippet :- 
> 
> Derived d;
> Base *base_ptr;
> base_ptr = &d;
> base_ptr->foo();        // where foo() is a virtual function
> 
> This genertes the following GIMPLE dump :- 
> 
> Derived::Derived (&d);
> base_ptr_6 = &d.D.3779;
> _1 = base_ptr_6->_vptr.Base;
> _2 = _1 + 8;
> _3 = *_2;
> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

I did a bit of playing with this example, and tried adding:

1876        case OBJ_TYPE_REF:
1877          gcc_unreachable ();
1878          break;

to region_model::get_rvalue_1, and running cc1plus under the debugger.

The debugger hits the "gcc_unreachable ();", at this stmt:

     OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);

Looking at the region_model with region_model::debug() shows:

(gdb) call debug()
stack depth: 1
  frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
  cluster for: Derived d
    key:   {bytes 0-7}
    value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived 
[3])(...)+(sizetype)16)}
  cluster for: base_ptr_5: &Derived d.<anonymous>
  cluster for: _2: &‘foo’
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
    ec0: {&Derived d.<anonymous>}
    ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
    ec2: {(void *)0B == [m_constant]‘0B’}
    ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
  constraints:
    0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
    1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void 
*)0B == [m_constant]‘0B’}
    2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} 
!= ec2: {(void *)0B == [m_constant]‘0B’}

i.e. it already "knows" that _2 is &'foo' for Derived::foo.

So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
function pointer directly from the vtable for such cases, so something
like:

    case OBJ_TYPE_REF:
        {
           tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
           return get_rvalue (expr, ctxt); 
        }
        break;

might get the function pointer.

(caveat: untested code)

> 
> Here instead of trying to extract virtual pointer from the call and see
> which subclass it belongs, I found it simpler to extract the actual
> pointer which is used to call the function itself (which from
> observation, is always the first parameter of the call) and used the
> region model at that point to figure out what is the type of the object
> it actually points to ultimately get the actual subclass who's function
> is being called here. :)
> 
> Now let me try to explain how I actually executed it ( A lot of
> assumptions here are based on observation, so please correct me
> wherever you think I made a false interpretation or forgot about a
> certain special case ) :
> 
> - once it is confirmed that the call that we are dealing with is a
> polymorphic call ( via the cgraph edge representing the call ), I used
> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
> in ipa-devirt.c ), to get the possible callee of that call. 
> 
>   function *fun = get_current_function ();
>   gcall *stmt  = const_cast<gcall *> (call);
>   cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>   if (e->indirect_info->polymorphic)
>   {
>     void *cache_token;
>     bool final;
>     vec <cgraph_node *> targets
>       = possible_polymorphic_call_targets (e, &final, &cache_token,
> true);
> 
> - Now if the list contains more than one targets, I will make use of
> the current enode's region model to get more info about the pointer
> which was used to call the function .
> 
>         /* here I extract the pointer (which was used to call the
> function), which from observation, is always the zeroth argument of the
> call.  */
>         tree t_val = gimple_call_arg (call, 0);
>         const svalue *sval = get_rvalue (t_val, ctxt);
> 
> - In all the examples I used, the pointer is represented as
> region_svalue or as initial_svalue (I think, initial_svalue is the case
> where the pointer is taken as a parameter of the current function and
> analyzer is analysing top-level call to this function )
> 
> Here are some examples of the following, Where I used
> __analyzer_describe () to show the same 
>  . (https://godbolt.org/z/Mqs8oM6ff)
>  . (https://godbolt.org/z/z4sfTM3f5))
> 
>         /* here I extract the region that the pointer is pointing to,
> and as both of them returns a (const region *), I used a lambda to get
> it ( If you want, I can turn this into a separate function to make it
> more readable )  */
> 
>         const region *reg
>           = [&]()->const region *
>               {
>                 switch (sval->get_kind ())
>                   {
>                     case SK_INITIAL:
>                       {
>                         const initial_svalue *initial_sval
>                           = sval->dyn_cast_initial_svalue ();
>                         return initial_sval->get_region ();
>                       }
>                       break;
>                     case SK_REGION:
>                       {
>                         const region_svalue *region_sval 
>                           = sval->dyn_cast_region_svalue ();
>                         return region_sval->get_pointee ();
>                       }
>                       break;
> 
>                     default:
>                       return NULL;
>                   }
>               } ();
> 
>         gcc_assert (reg);
> 
>         /* Now that I have the region, I tried to get the type of the
> object it is holding and put it in ‘known_possible_subclass_type’.  */
> 
>         tree known_possible_subclass_type;
>         known_possible_subclass_type = reg->get_type ();
>         if (reg->get_kind () == RK_FIELD)
>           {
>              const field_region* field_reg = reg->dyn_cast_field_region
> ();
>              known_possible_subclass_type 
>                = DECL_CONTEXT (field_reg->get_field ());
>           }
> 
> /* After that I iterated over the entire array of possible calls to
> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
> as that of the type of the object that the pointer is actually pointing
> to.  */
> 
>         for (cgraph_node *x : targets)
>           {
>             if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>               most_propbable_taget = x->decl;
>           }
>         return most_propbable_taget;
>       }
>    }
> 
> I tested it on all of the test programs I created and till now in all
> of the cases, the analyzer is correctly determining the call. I am
> currently in the process of creating more tests ( including multiple
> types of inheritances ) to see how successful is this implementation .

I'm still skeptical of the above code; my feeling is that with more
tests you'll find cases where it doesn't work.  Maybe dynamically
allocated instances?

Hope this is constructive

Dave

Reply via email to