> On 06-Aug-2021, at 4:39 AM, David Malcolm <dmalc...@redhat.com> wrote:
>
> On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
>>
>>
>>> On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalc...@redhat.com>
>>> wrote:
>>>
>>> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
>>>
>>> [...snip...]
>>>>
>>>> - From observation, a typical vfunc call that isn't devirtualised
>>>> by
>>>> the compiler's front end looks something like this
>>>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
>>>> where "a_ptr_5(D)" is pointer that is being used to call the
>>>> virtual
>>>> function.
>>>>
>>>> - We can access it's region to see what is the type of the object
>>>> the
>>>> pointer is actually pointing to.
>>>>
>>>> - This is then used to find a call with DECL_CONTEXT of the object
>>>> from the all the possible targets of that polymorphic call.
>>>
>>> [...]
>>>
>>>>
>>>> Patch file ( prototype ) :
>>>>
>>>
>>>> + /* Call is possibly a polymorphic call.
>>>> +
>>>> + In such case, use devirtisation tools to find
>>>> + possible callees of this function call. */
>>>> +
>>>> + function *fun = get_current_function ();
>>>> + gcall *stmt = const_cast<gcall *> (call);
>>>> + cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>>>> + if (e->indirect_info->polymorphic)
>>>> + {
>>>> + void *cache_token;
>>>> + bool final;
>>>> + vec <cgraph_node *> targets
>>>> + = possible_polymorphic_call_targets (e, &final,
>>>> &cache_token, true);
>>>> + if (!targets.is_empty ())
>>>> + {
>>>> + tree most_propbable_taget = NULL_TREE;
>>>> + if(targets.length () == 1)
>>>> + return targets[0]->decl;
>>>> +
>>>> + /* From the current state, check which subclass the
>>>> pointer that
>>>> + is being used to this polymorphic call points to, and
>>>> use to
>>>> + filter out correct function call. */
>>>> + tree t_val = gimple_call_arg (call, 0);
>>>
>>> Maybe rename to "this_expr"?
>>>
>>>
>>>> + const svalue *sval = get_rvalue (t_val, ctxt);
>>>
>>> and "this_sval"?
>>
>> ok
>>
>>>
>>> ...assuming that that's what the value is.
>>>
>>> Probably should reject the case where there are zero arguments.
>>
>> Ideally it should always have one argument representing the pointer
>> used to call the function.
>>
>> for example, if the function is called like this : -
>>
>> a_ptr->foo(arg); // where foo() is a virtual function and a_ptr is a
>> pointer to an object of a subclass.
>>
>> I saw that it’s GIMPLE representation is as follows : -
>>
>> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
>>
>>>
>>>
>>>> +
>>>> + const region *reg
>>>> + = [&]()->const region *
>>>> + {
>>>> + switch (sval->get_kind ())
>>>> + {
>>>> + case SK_INITIAL:
>>>> + {
>>>> + const initial_svalue *initial_sval
>>>> + = sval->dyn_cast_initial_svalue ();
>>>> + return initial_sval->get_region ();
>>>> + }
>>>> + break;
>>>> + case SK_REGION:
>>>> + {
>>>> + const region_svalue *region_sval
>>>> + = sval->dyn_cast_region_svalue ();
>>>> + return region_sval->get_pointee ();
>>>> + }
>>>> + break;
>>>> +
>>>> + default:
>>>> + return NULL;
>>>> + }
>>>> + } ();
>>>
>>> I think the above should probably be a subroutine.
>>>
>>> That said, it's not clear to me what it's doing, or that this is
>>> correct.
>>
>>
>> Sorry, I think I should have explained it earlier.
>>
>> Let's take an example code snippet :-
>>
>> Derived d;
>> Base *base_ptr;
>> base_ptr = &d;
>> base_ptr->foo(); // where foo() is a virtual function
>>
>> This genertes the following GIMPLE dump :-
>>
>> Derived::Derived (&d);
>> base_ptr_6 = &d.D.3779;
>> _1 = base_ptr_6->_vptr.Base;
>> _2 = _1 + 8;
>> _3 = *_2;
>> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);
>
> I did a bit of playing with this example, and tried adding:
>
> 1876 case OBJ_TYPE_REF:
> 1877 gcc_unreachable ();
> 1878 break;
>
> to region_model::get_rvalue_1, and running cc1plus under the debugger.
>
> The debugger hits the "gcc_unreachable ();", at this stmt:
>
> OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);
>
> Looking at the region_model with region_model::debug() shows:
>
> (gdb) call debug()
> stack depth: 1
> frame (index 0): frame: ‘test’@1
> clusters within frame: ‘test’@1
> cluster for: Derived d
> key: {bytes 0-7}
> value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived
> [3])(...)+(sizetype)16)}
> cluster for: base_ptr_5: &Derived d.<anonymous>
> cluster for: _2: &‘foo’
> m_called_unknown_fn: FALSE
> constraint_manager:
> equiv classes:
> ec0: {&Derived d.<anonymous>}
> ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
> ec2: {(void *)0B == [m_constant]‘0B’}
> ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
> constraints:
> 0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
> 1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void
> *)0B == [m_constant]‘0B’}
> 2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
> != ec2: {(void *)0B == [m_constant]‘0B’}
>
> i.e. it already "knows" that _2 is &'foo' for Derived::foo.
>
> So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
> function pointer directly from the vtable for such cases, so something
> like:
>
> case OBJ_TYPE_REF:
> {
> tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
> return get_rvalue (expr, ctxt);
> }
> break;
>
> might get the function pointer.
I tried it, and yes, it works like a charm. Thanks : )
>
> (caveat: untested code)
>
>>
>> Here instead of trying to extract virtual pointer from the call and see
>> which subclass it belongs, I found it simpler to extract the actual
>> pointer which is used to call the function itself (which from
>> observation, is always the first parameter of the call) and used the
>> region model at that point to figure out what is the type of the object
>> it actually points to ultimately get the actual subclass who's function
>> is being called here. :)
>>
>> Now let me try to explain how I actually executed it ( A lot of
>> assumptions here are based on observation, so please correct me
>> wherever you think I made a false interpretation or forgot about a
>> certain special case ) :
>>
>> - once it is confirmed that the call that we are dealing with is a
>> polymorphic call ( via the cgraph edge representing the call ), I used
>> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
>> in ipa-devirt.c ), to get the possible callee of that call.
>>
>> function *fun = get_current_function ();
>> gcall *stmt = const_cast<gcall *> (call);
>> cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>> if (e->indirect_info->polymorphic)
>> {
>> void *cache_token;
>> bool final;
>> vec <cgraph_node *> targets
>> = possible_polymorphic_call_targets (e, &final, &cache_token,
>> true);
>>
>> - Now if the list contains more than one targets, I will make use of
>> the current enode's region model to get more info about the pointer
>> which was used to call the function .
>>
>> /* here I extract the pointer (which was used to call the
>> function), which from observation, is always the zeroth argument of the
>> call. */
>> tree t_val = gimple_call_arg (call, 0);
>> const svalue *sval = get_rvalue (t_val, ctxt);
>>
>> - In all the examples I used, the pointer is represented as
>> region_svalue or as initial_svalue (I think, initial_svalue is the case
>> where the pointer is taken as a parameter of the current function and
>> analyzer is analysing top-level call to this function )
>>
>> Here are some examples of the following, Where I used
>> __analyzer_describe () to show the same
>> . (https://godbolt.org/z/Mqs8oM6ff)
>> . (https://godbolt.org/z/z4sfTM3f5))
>>
>> /* here I extract the region that the pointer is pointing to,
>> and as both of them returns a (const region *), I used a lambda to get
>> it ( If you want, I can turn this into a separate function to make it
>> more readable ) */
>>
>> const region *reg
>> = [&]()->const region *
>> {
>> switch (sval->get_kind ())
>> {
>> case SK_INITIAL:
>> {
>> const initial_svalue *initial_sval
>> = sval->dyn_cast_initial_svalue ();
>> return initial_sval->get_region ();
>> }
>> break;
>> case SK_REGION:
>> {
>> const region_svalue *region_sval
>> = sval->dyn_cast_region_svalue ();
>> return region_sval->get_pointee ();
>> }
>> break;
>>
>> default:
>> return NULL;
>> }
>> } ();
>>
>> gcc_assert (reg);
>>
>> /* Now that I have the region, I tried to get the type of the
>> object it is holding and put it in ‘known_possible_subclass_type’. */
>>
>> tree known_possible_subclass_type;
>> known_possible_subclass_type = reg->get_type ();
>> if (reg->get_kind () == RK_FIELD)
>> {
>> const field_region* field_reg = reg->dyn_cast_field_region
>> ();
>> known_possible_subclass_type
>> = DECL_CONTEXT (field_reg->get_field ());
>> }
>>
>> /* After that I iterated over the entire array of possible calls to
>> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
>> as that of the type of the object that the pointer is actually pointing
>> to. */
>>
>> for (cgraph_node *x : targets)
>> {
>> if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>> most_propbable_taget = x->decl;
>> }
>> return most_propbable_taget;
>> }
>> }
>>
>> I tested it on all of the test programs I created and till now in all
>> of the cases, the analyzer is correctly determining the call. I am
>> currently in the process of creating more tests ( including multiple
>> types of inheritances ) to see how successful is this implementation .
>
> I'm still skeptical of the above code; my feeling is that with more
> tests you'll find cases where it doesn't work. Maybe dynamically
> allocated instances?
That’s what I was thinking, and that’s why I wanted it to test on more
programs, but looks like I don’t have need this anymore.
>
> Hope this is constructive
>
> Dave
>
Thanks
- Ankur