Hi, this patch make symtab_remove_unreachable_nodes to use polymorphic call analysis same was as the cgraph construction code. The use is basically simple - instead of marking all virtual functions as potentially reachable until inlining, we mark only those that appear in the list of possible targets. Like in cgraph construction I deal with two special cases
1) when there is only one target and it is final list, I devirtualize 2) when the target is in anonymous namespace, I do not mark it live. If it is really live, its virtual table is live and it will be marked while visitng it. I extended now 1) to also deal with case of 0 possible targets - then we turn call to direct call to BUILT_IN_UNREACHABLE. This happens i.e. when you have type in anonymous namespace and all constructors of the type are optimized out. I retrofited the change into same place in the cgraph construction code. I also removed non-speculative devirtualization from ipa-devirt pass, since it is always done by unreachable code removal now. It makes more sense there: it is better to devirtualize early and the test is essentially free during the walk. To get analysis of anonymous namespace types right, ipa-devirt code now check reachablity of virtual table it visits. This makes the target cache sensitive to list of reachable virtual tables and thus it needs to register newly introduced varpool_node_removal hook and flush the cache when this happens. New timevar is added to keep time spent by unreachable code removal logged. It is not increased in any dramatic way for Firefox WPA build - it stays around 1-2% of WPA time. The effect of this patch on firefox is not grand. About 2000 extra functions are removed prior inlining, but it is not saving peak memory since they are already loaded to memory anyway and formely they was removed after inlining. I did not measure effect on inliner decision quality that ought to be positively affected by this. This code will become more effective with followup patch for better hiearchy analysis (when it will be able to take into account knowledge that becomes known by early inlining and thus it will become stronger than the equivalent logic done at cgraph construction time). Bootstrapped/regtested x86_64-linux with workaround for the current pt.c ICE, will commit it shortly. Honza * cgraphunit.c (walk_polymorphic_call_targets): Permit 0 possible targets and devirtualize to BUILT_IN_UNREACHABLE. * timevar.def (TV_IPA_UNREACHABLE): New timevar. * ipa.c (walk_polymorphic_call_targets): New function. (symtab_remove_unreachable_nodes): Use it; do not keep all virtual functions; use the new timevar. * ipa-devirt.c (maybe_record_node): Do not insert static nodes that was removed from the program. (record_binfo): If BINFO corresponds to an anonymous namespace, we may not consider it in the walk when its vtable is dead. (possible_polymorphic_call_targets_1): Pass anonymous flag to record_binfo. (devirt_variable_node_removal_hook): New function. (possible_polymorphic_call_targets): Also register devirt_variable_node_removal_hook. (ipa_devirt): Do not do non-speculative devirtualization. (gate_ipa_devirt): One execute if devirtualizing speculatively. * testsuite/g++.dg/ipa/devirt-11.C: Update template. * testsuite/g++.dg/ipa/devirt-16.C: New testcase. * testsuite/g++.dg/ipa/devirt-17.C: New testcase. * testsuite/g++.dg/ipa/devirt-18.C: New testcase. Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 202364) +++ cgraphunit.c (working copy) @@ -866,9 +866,15 @@ walk_polymorphic_call_targets (pointer_s make the edge direct. */ if (final) { - gcc_assert (targets.length()); - if (targets.length() == 1) + if (targets.length() <= 1) { + cgraph_node *target; + if (targets.length () == 1) + target = targets[0]; + else + target = cgraph_get_create_node + (builtin_decl_implicit (BUILT_IN_UNREACHABLE)); + if (cgraph_dump_file) { fprintf (cgraph_dump_file, @@ -877,7 +883,7 @@ walk_polymorphic_call_targets (pointer_s edge->call_stmt, 0, TDF_SLIM); } - cgraph_make_edge_direct (edge, targets[0]); + cgraph_make_edge_direct (edge, target); cgraph_redirect_edge_call_stmt_to_callee (edge); if (cgraph_dump_file) { @@ -1092,7 +1098,7 @@ analyze_functions (void) mangling and same body alias creation before we free DECL_ARGUMENTS used by it. */ if (!seen_error ()) - symtab_initialize_asm_name_hash (); + symtab_initialize_asm_name_hash (); } /* Translate the ugly representation of aliases as alias pairs into nice Index: testsuite/g++.dg/ipa/devirt-16.C =================================================================== --- testsuite/g++.dg/ipa/devirt-16.C (revision 0) +++ testsuite/g++.dg/ipa/devirt-16.C (revision 0) @@ -0,0 +1,39 @@ +/* We shall devirtualize to unreachable. No anonymous type method should surivve + reachability. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-whole-program" } */ +namespace { +class B { +public: + virtual int foo(void) +{ + return 0; +} +}; +class A : public B { +public: + virtual int foo(void) +{ + return 1; +} +}; +} +class B *b; +main() +{ + int c; + if (c) + { + class A a; + a.foo(); + class B b; + b.foo(); + } + return b->foo(); +} + +/* { dg-final { scan-ipa-dump "Devirtualizing" "whole-program"} } */ +/* { dg-final { scan-ipa-dump "builtin_unreachable" "whole-program"} } */ +/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */ +/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */ +/* { dg-final { cleanup-ipa-dump "whole-program" } } */ Index: testsuite/g++.dg/ipa/devirt-18.C =================================================================== --- testsuite/g++.dg/ipa/devirt-18.C (revision 0) +++ testsuite/g++.dg/ipa/devirt-18.C (revision 0) @@ -0,0 +1,37 @@ +/* We shall devirtualize to unreachable. No anonymous type method should surivve + reachability. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ssa" } */ +namespace { +class B { +public: + virtual int foo(void) +{ + return 0; +} +}; +class A : public B { +public: + virtual int foo(void) +{ + return 1; +} +}; +} +class B *b; +main() +{ + if (0) + { + class A a; + a.foo(); + class B b; + b.foo(); + } + return b->foo(); +} + +/* { dg-final { scan-tree-dump-not "A::foo" "ssa"} } */ +/* { dg-final { scan-tree-dump-not "B::foo" "ssa"} } */ +/* { dg-final { scan-tree-dump "builtin_unreachable" "ssa"} } */ +/* { dg-final { cleanup-tree-dump "ssa" } } */ Index: testsuite/g++.dg/ipa/devirt-11.C =================================================================== --- testsuite/g++.dg/ipa/devirt-11.C (revision 202364) +++ testsuite/g++.dg/ipa/devirt-11.C (working copy) @@ -46,5 +46,4 @@ bar () and two to fn3. While doing so the new symbol for fn2 needs to be introduced. */ /* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known target" 3 "inline" } } */ -/* { dg-final { scan-ipa-dump-times "and turned into root of the clone tree" 1 "inline" } } */ /* { dg-final { cleanup-ipa-dump "inline" } } */ Index: testsuite/g++.dg/ipa/devirt-17.C =================================================================== --- testsuite/g++.dg/ipa/devirt-17.C (revision 0) +++ testsuite/g++.dg/ipa/devirt-17.C (revision 0) @@ -0,0 +1,44 @@ +/* We shall devirtualize to B::foo since it is the only live candidate of an + anonymous type. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-whole-program" } */ +namespace { +class B { +public: + virtual int foo(void) +{ + return 0; +} +}; +class A : public B { +public: + virtual int foo(void) +{ + return 1; +} +}; +} +class B *b; +void get_me_lost (void *); +main() +{ + int c; + if (c) + { + class A a; + a.foo(); + } + else + { + b = new (class B); + b->foo(); + get_me_lost ((void *)&b); + } + return b->foo(); +} + +/* { dg-final { scan-ipa-dump "Devirtualizing" "whole-program"} } */ +/* { dg-final { scan-ipa-dump-not "builtin_unreachable" "whole-program"} } */ +/* { dg-final { scan-ipa-dump "B::foo" "whole-program"} } */ +/* { dg-final { scan-ipa-dump-not "A::foo" "whole-program"} } */ +/* { dg-final { cleanup-ipa-dump "whole-program" } } */ Index: timevar.def =================================================================== --- timevar.def (revision 202364) +++ timevar.def (working copy) @@ -64,6 +64,7 @@ DEFTIMEVAR (TV_PCH_CPP_RESTORE , " DEFTIMEVAR (TV_CGRAPH , "callgraph construction") DEFTIMEVAR (TV_CGRAPHOPT , "callgraph optimization") +DEFTIMEVAR (TV_IPA_UNREACHABLE , "ipa dead code removal") DEFTIMEVAR (TV_IPA_INHERITANCE , "ipa inheritance graph") DEFTIMEVAR (TV_IPA_VIRTUAL_CALL , "ipa virtual call target") DEFTIMEVAR (TV_IPA_DEVIRT , "ipa devirtualization") Index: ipa-devirt.c =================================================================== --- ipa-devirt.c (revision 202364) +++ ipa-devirt.c (working copy) @@ -570,6 +570,8 @@ maybe_record_node (vec <cgraph_node *> & && fcode != BUILT_IN_TRAP && !pointer_set_insert (inserted, target) && (target_node = cgraph_get_node (target)) != NULL + && (TREE_PUBLIC (target) + || target_node->symbol.definition) && symtab_real_symbol_p ((symtab_node)target_node)) { pointer_set_insert (cached_polymorphic_call_targets, @@ -591,6 +593,8 @@ maybe_record_node (vec <cgraph_node *> & MATCHED_VTABLES tracks virtual tables we already did lookup for virtual function in. + + ANONYMOUS is true if BINFO is part of anonymous namespace. */ static void @@ -600,7 +604,8 @@ record_binfo (vec <cgraph_node *> &nodes tree type_binfo, HOST_WIDE_INT otr_token, pointer_set_t *inserted, - pointer_set_t *matched_vtables) + pointer_set_t *matched_vtables, + bool anonymous) { tree type = BINFO_TYPE (binfo); int i; @@ -611,6 +616,19 @@ record_binfo (vec <cgraph_node *> &nodes if (types_same_for_odr (type, otr_type) && !pointer_set_insert (matched_vtables, BINFO_VTABLE (type_binfo))) { + /* For types in anonymous namespace first check if the respective vtable + is alive. If not, we know the type can't be called. */ + if (!flag_ltrans && anonymous) + { + tree vtable = BINFO_VTABLE (type_binfo); + struct varpool_node *vnode; + + if (TREE_CODE (vtable) == POINTER_PLUS_EXPR) + vtable = TREE_OPERAND (TREE_OPERAND (vtable, 0), 0); + vnode = varpool_get_node (vtable); + if (!vnode || !vnode->symbol.definition) + return; + } tree target = gimple_get_virt_method_for_binfo (otr_token, type_binfo); if (target) maybe_record_node (nodes, target, inserted); @@ -626,7 +644,7 @@ record_binfo (vec <cgraph_node *> &nodes is shared with the outer type. */ BINFO_VTABLE (base_binfo) ? base_binfo : type_binfo, otr_token, inserted, - matched_vtables); + matched_vtables, anonymous); } /* Lookup virtual methods matching OTR_TYPE (with OFFSET and OTR_TOKEN) @@ -646,7 +664,7 @@ possible_polymorphic_call_targets_1 (vec unsigned int i; record_binfo (nodes, binfo, otr_type, binfo, otr_token, inserted, - matched_vtables); + matched_vtables, type->anonymous_namespace); for (i = 0; i < type->derived_types.length(); i++) possible_polymorphic_call_targets_1 (nodes, inserted, matched_vtables, @@ -735,6 +753,18 @@ devirt_node_removal_hook (struct cgraph_ free_polymorphic_call_targets_hash (); } +/* When virtual table is removed, we may need to flush the cache. */ + +static void +devirt_variable_node_removal_hook (struct varpool_node *n, + void *d ATTRIBUTE_UNUSED) +{ + if (cached_polymorphic_call_targets + && DECL_VIRTUAL_P (n->symbol.decl) + && type_in_anonymous_namespace_p (DECL_CONTEXT (n->symbol.decl))) + free_polymorphic_call_targets_hash (); +} + /* Return vector containing possible targets of polymorphic call of type OTR_TYPE caling method OTR_TOKEN with OFFSET. If FINALp is non-NULL, store true if the list is complette. @@ -782,8 +812,12 @@ possible_polymorphic_call_targets (tree cached_polymorphic_call_targets = pointer_set_create (); polymorphic_call_target_hash.create (23); if (!node_removal_hook_holder) - node_removal_hook_holder = - cgraph_add_node_removal_hook (&devirt_node_removal_hook, NULL); + { + node_removal_hook_holder = + cgraph_add_node_removal_hook (&devirt_node_removal_hook, NULL); + varpool_add_node_removal_hook (&devirt_variable_node_removal_hook, + NULL); + } } /* Lookup cached answer. */ @@ -928,11 +962,8 @@ likely_target_p (struct cgraph_node *n) } /* The ipa-devirt pass. - This performs very trivial devirtualization: - 1) when polymorphic call is known to have precisely one target, - turn it into direct call - 2) when polymorphic call has only one likely target in the unit, - turn it into speculative call. */ + When polymorphic call has only one likely target in the unit, + turn it into speculative call. */ static unsigned int ipa_devirt (void) @@ -965,26 +996,9 @@ ipa_devirt (void) if (dump_file) dump_possible_polymorphic_call_targets (dump_file, e); + npolymorphic++; - if (final) - { - gcc_assert (targets.length()); - if (targets.length() == 1) - { - if (dump_file) - fprintf (dump_file, - "Devirtualizing call in %s/%i to %s/%i\n", - cgraph_node_name (n), n->symbol.order, - cgraph_node_name (targets[0]), targets[0]->symbol.order); - cgraph_make_edge_direct (e, targets[0]); - ndevirtualized++; - update = true; - continue; - } - } - if (!flag_devirtualize_speculatively) - continue; if (!cgraph_maybe_hot_edge_p (e)) { if (dump_file) @@ -1114,7 +1128,7 @@ ipa_devirt (void) static bool gate_ipa_devirt (void) { - return flag_devirtualize && !in_lto_p && optimize; + return flag_devirtualize_speculatively && !in_lto_p && optimize; } namespace { Index: ipa.c =================================================================== --- ipa.c (revision 202364) +++ ipa.c (working copy) @@ -149,6 +149,84 @@ process_references (struct ipa_ref_list } } +/* EDGE is an polymorphic call. If BEFORE_INLINING_P is set, mark + all its potential targets as reachable to permit later inlining if + devirtualization happens. After inlining still keep their declarations + around, so we can devirtualize to a direct call. + + Also try to make trivial devirutalization when no or only one target is + possible. */ + +static void +walk_polymorphic_call_targets (pointer_set_t *reachable_call_targets, + struct cgraph_edge *edge, + symtab_node *first, + pointer_set_t *reachable, bool before_inlining_p) +{ + unsigned int i; + void *cache_token; + bool final; + vec <cgraph_node *>targets + = possible_polymorphic_call_targets + (edge, &final, &cache_token); + + if (!pointer_set_insert (reachable_call_targets, + cache_token)) + { + for (i = 0; i < targets.length(); i++) + { + struct cgraph_node *n = targets[i]; + + /* Do not bother to mark virtual methods in anonymous namespace; + either we will find use of virtual table defining it, or it is + unused. */ + if (TREE_CODE (TREE_TYPE (n->symbol.decl)) == METHOD_TYPE + && type_in_anonymous_namespace_p + (method_class_type (TREE_TYPE (n->symbol.decl)))) + continue; + + /* Prior inlining, keep alive bodies of possible targets for + devirtualization. */ + if (n->symbol.definition + && before_inlining_p) + pointer_set_insert (reachable, n); + + /* Even after inlining we want to keep the possible targets in the + boundary, so late passes can still produce direct call even if + the chance for inlining is lost. */ + enqueue_node ((symtab_node) n, first, reachable); + } + } + + /* Very trivial devirtualization; when the type is + final or anonymous (so we know all its derivation) + and there is only one possible virtual call target, + make the edge direct. */ + if (final) + { + if (targets.length() <= 1) + { + cgraph_node *target; + if (targets.length () == 1) + target = targets[0]; + else + target = cgraph_get_create_node + (builtin_decl_implicit (BUILT_IN_UNREACHABLE)); + + if (dump_file) + fprintf (dump_file, + "Devirtualizing call in %s/%i to %s/%i\n", + cgraph_node_name (edge->caller), + edge->caller->symbol.order, + cgraph_node_name (target), target->symbol.order); + edge = cgraph_make_edge_direct (edge, target); + if (cgraph_state != CGRAPH_STATE_IPA_SSA) + cgraph_redirect_edge_call_stmt_to_callee (edge); + else + inline_update_overall_summary (edge->caller); + } + } +} /* Perform reachability analysis and reclaim all unreachable nodes. @@ -214,7 +292,9 @@ symtab_remove_unreachable_nodes (bool be bool changed = false; struct pointer_set_t *reachable = pointer_set_create (); struct pointer_set_t *body_needed_for_clonning = pointer_set_create (); + struct pointer_set_t *reachable_call_targets = pointer_set_create (); + timevar_push (TV_IPA_UNREACHABLE); #ifdef ENABLE_CHECKING verify_symtab (); #endif @@ -238,10 +318,7 @@ symtab_remove_unreachable_nodes (bool be if (node->symbol.definition && !node->global.inlined_to && !node->symbol.in_other_partition - && (!cgraph_can_remove_if_no_direct_calls_and_refs_p (node) - /* Keep around virtual functions for possible devirtualization. */ - || (before_inlining_p - && DECL_VIRTUAL_P (node->symbol.decl)))) + && !cgraph_can_remove_if_no_direct_calls_and_refs_p (node)) { gcc_assert (!node->global.inlined_to); pointer_set_insert (reachable, node); @@ -304,6 +381,19 @@ symtab_remove_unreachable_nodes (bool be if (!in_boundary_p) { struct cgraph_edge *e; + /* Keep alive possible targets for devirtualization. */ + if (optimize && flag_devirtualize) + { + struct cgraph_edge *next; + for (e = cnode->indirect_calls; e; e = next) + { + next = e->next_callee; + if (e->indirect_info->polymorphic) + walk_polymorphic_call_targets (reachable_call_targets, + e, &first, reachable, + before_inlining_p); + } + } for (e = cnode->callees; e; e = e->next_callee) { if (e->callee->symbol.definition @@ -449,6 +539,7 @@ symtab_remove_unreachable_nodes (bool be pointer_set_destroy (reachable); pointer_set_destroy (body_needed_for_clonning); + pointer_set_destroy (reachable_call_targets); /* Now update address_taken flags and try to promote functions to be local. */ if (file) @@ -483,6 +574,7 @@ symtab_remove_unreachable_nodes (bool be FOR_EACH_DEFINED_FUNCTION (node) ipa_propagate_frequency (node); + timevar_pop (TV_IPA_UNREACHABLE); return changed; }