On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote: > > > > On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalc...@redhat.com> > > wrote: > > > > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote: > > > > [...snip...] > > > > > > - From observation, a typical vfunc call that isn't devirtualised > > > by > > > the compiler's front end looks something like this > > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))" > > > where "a_ptr_5(D)" is pointer that is being used to call the > > > virtual > > > function. > > > > > > - We can access it's region to see what is the type of the object > > > the > > > pointer is actually pointing to. > > > > > > - This is then used to find a call with DECL_CONTEXT of the object > > > from the all the possible targets of that polymorphic call. > > > > [...] > > > > > > > > Patch file ( prototype ) : > > > > > > > > + /* Call is possibly a polymorphic call. > > > + > > > + In such case, use devirtisation tools to find > > > + possible callees of this function call. */ > > > + > > > + function *fun = get_current_function (); > > > + gcall *stmt = const_cast<gcall *> (call); > > > + cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt); > > > + if (e->indirect_info->polymorphic) > > > + { > > > + void *cache_token; > > > + bool final; > > > + vec <cgraph_node *> targets > > > + = possible_polymorphic_call_targets (e, &final, > > > &cache_token, true); > > > + if (!targets.is_empty ()) > > > + { > > > + tree most_propbable_taget = NULL_TREE; > > > + if(targets.length () == 1) > > > + return targets[0]->decl; > > > + > > > + /* From the current state, check which subclass the > > > pointer that > > > + is being used to this polymorphic call points to, and > > > use to > > > + filter out correct function call. */ > > > + tree t_val = gimple_call_arg (call, 0); > > > > Maybe rename to "this_expr"? > > > > > > > + const svalue *sval = get_rvalue (t_val, ctxt); > > > > and "this_sval"? > > ok > > > > > ...assuming that that's what the value is. > > > > Probably should reject the case where there are zero arguments. > > Ideally it should always have one argument representing the pointer > used to call the function. > > for example, if the function is called like this : - > > a_ptr->foo(arg); // where foo() is a virtual function and a_ptr is a > pointer to an object of a subclass. > > I saw that it’s GIMPLE representation is as follows : - > > OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg); > > > > > > > > + > > > + const region *reg > > > + = [&]()->const region * > > > + { > > > + switch (sval->get_kind ()) > > > + { > > > + case SK_INITIAL: > > > + { > > > + const initial_svalue *initial_sval > > > + = sval->dyn_cast_initial_svalue (); > > > + return initial_sval->get_region (); > > > + } > > > + break; > > > + case SK_REGION: > > > + { > > > + const region_svalue *region_sval > > > + = sval->dyn_cast_region_svalue (); > > > + return region_sval->get_pointee (); > > > + } > > > + break; > > > + > > > + default: > > > + return NULL; > > > + } > > > + } (); > > > > I think the above should probably be a subroutine. > > > > That said, it's not clear to me what it's doing, or that this is > > correct. > > > Sorry, I think I should have explained it earlier. > > Let's take an example code snippet :- > > Derived d; > Base *base_ptr; > base_ptr = &d; > base_ptr->foo(); // where foo() is a virtual function > > This genertes the following GIMPLE dump :- > > Derived::Derived (&d); > base_ptr_6 = &d.D.3779; > _1 = base_ptr_6->_vptr.Base; > _2 = _1 + 8; > _3 = *_2; > OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);
I did a bit of playing with this example, and tried adding: 1876 case OBJ_TYPE_REF: 1877 gcc_unreachable (); 1878 break; to region_model::get_rvalue_1, and running cc1plus under the debugger. The debugger hits the "gcc_unreachable ();", at this stmt: OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5); Looking at the region_model with region_model::debug() shows: (gdb) call debug() stack depth: 1 frame (index 0): frame: ‘test’@1 clusters within frame: ‘test’@1 cluster for: Derived d key: {bytes 0-7} value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} cluster for: base_ptr_5: &Derived d.<anonymous> cluster for: _2: &‘foo’ m_called_unknown_fn: FALSE constraint_manager: equiv classes: ec0: {&Derived d.<anonymous>} ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} ec2: {(void *)0B == [m_constant]‘0B’} ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} constraints: 0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’} 1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void *)0B == [m_constant]‘0B’} 2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} != ec2: {(void *)0B == [m_constant]‘0B’} i.e. it already "knows" that _2 is &'foo' for Derived::foo. So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the function pointer directly from the vtable for such cases, so something like: case OBJ_TYPE_REF: { tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree); return get_rvalue (expr, ctxt); } break; might get the function pointer. (caveat: untested code) > > Here instead of trying to extract virtual pointer from the call and see > which subclass it belongs, I found it simpler to extract the actual > pointer which is used to call the function itself (which from > observation, is always the first parameter of the call) and used the > region model at that point to figure out what is the type of the object > it actually points to ultimately get the actual subclass who's function > is being called here. :) > > Now let me try to explain how I actually executed it ( A lot of > assumptions here are based on observation, so please correct me > wherever you think I made a false interpretation or forgot about a > certain special case ) : > > - once it is confirmed that the call that we are dealing with is a > polymorphic call ( via the cgraph edge representing the call ), I used > the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined > in ipa-devirt.c ), to get the possible callee of that call. > > function *fun = get_current_function (); > gcall *stmt = const_cast<gcall *> (call); > cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt); > if (e->indirect_info->polymorphic) > { > void *cache_token; > bool final; > vec <cgraph_node *> targets > = possible_polymorphic_call_targets (e, &final, &cache_token, > true); > > - Now if the list contains more than one targets, I will make use of > the current enode's region model to get more info about the pointer > which was used to call the function . > > /* here I extract the pointer (which was used to call the > function), which from observation, is always the zeroth argument of the > call. */ > tree t_val = gimple_call_arg (call, 0); > const svalue *sval = get_rvalue (t_val, ctxt); > > - In all the examples I used, the pointer is represented as > region_svalue or as initial_svalue (I think, initial_svalue is the case > where the pointer is taken as a parameter of the current function and > analyzer is analysing top-level call to this function ) > > Here are some examples of the following, Where I used > __analyzer_describe () to show the same > . (https://godbolt.org/z/Mqs8oM6ff) > . (https://godbolt.org/z/z4sfTM3f5)) > > /* here I extract the region that the pointer is pointing to, > and as both of them returns a (const region *), I used a lambda to get > it ( If you want, I can turn this into a separate function to make it > more readable ) */ > > const region *reg > = [&]()->const region * > { > switch (sval->get_kind ()) > { > case SK_INITIAL: > { > const initial_svalue *initial_sval > = sval->dyn_cast_initial_svalue (); > return initial_sval->get_region (); > } > break; > case SK_REGION: > { > const region_svalue *region_sval > = sval->dyn_cast_region_svalue (); > return region_sval->get_pointee (); > } > break; > > default: > return NULL; > } > } (); > > gcc_assert (reg); > > /* Now that I have the region, I tried to get the type of the > object it is holding and put it in ‘known_possible_subclass_type’. */ > > tree known_possible_subclass_type; > known_possible_subclass_type = reg->get_type (); > if (reg->get_kind () == RK_FIELD) > { > const field_region* field_reg = reg->dyn_cast_field_region > (); > known_possible_subclass_type > = DECL_CONTEXT (field_reg->get_field ()); > } > > /* After that I iterated over the entire array of possible calls to > find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same > as that of the type of the object that the pointer is actually pointing > to. */ > > for (cgraph_node *x : targets) > { > if (DECL_CONTEXT (x->decl) == known_possible_subclass_type) > most_propbable_taget = x->decl; > } > return most_propbable_taget; > } > } > > I tested it on all of the test programs I created and till now in all > of the cases, the analyzer is correctly determining the call. I am > currently in the process of creating more tests ( including multiple > types of inheritances ) to see how successful is this implementation . I'm still skeptical of the above code; my feeling is that with more tests you'll find cases where it doesn't work. Maybe dynamically allocated instances? Hope this is constructive Dave