Hi, some comments on the ipa part of the patch (and thanks for working on it - this was on my TODO list for years)
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index de82316d4b1..0d373a67d1b 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl) > fprintf (dump_file, "Introduced new external node " > "(%s) and turned into root of the clone tree.\n", > node->dump_name ()); > + node->profile_id = first_clone->profile_id; > } > else if (dump_file) > fprintf (dump_file, "Introduced new external node " This is independent of the rest of changes. Do you have example where this matters? The inline clones are created in ipa-inline while ipa-profile is run before it, so I can not think of such a scenario. I see you also copy profile_id from function to clone. I would like to know why you needed that. Also you mention that you hit some ICEs. If fixes are independent of rest of your changes, send them separately. > @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge > *&direct, > int i; > cgraph_edge *e2; > cgraph_edge *e = this; > + cgraph_node *referred_node; > > if (!e->indirect_unknown_callee) > for (e2 = e->caller->indirect_calls; > @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge > *&direct, > && ((ref->stmt && ref->stmt == e->call_stmt) > || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) > { > - reference = ref; > - break; > + if (e2->indirect_info && e2->indirect_info->num_of_ics) > + { > + referred_node = dyn_cast<cgraph_node *> (ref->referred); > + if (strstr (e->callee->name (), referred_node->name ())) > + { > + reference = ref; > + break; > + } > + } > + else > + { > + reference = ref; > + break; > + } > } This function is intended to return everything related to the speculative call, so if you add multiple direct targets, i would expect it to tage auto_vec of cgraph_nodes for direct and auto_vec of references. > > /* Speculative edge always consist of all three components - direct edge, > @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl) > in the functions inlined through it. */ > } > edge->count += e2->count; > - edge->speculative = false; > + if (edge->indirect_info && edge->indirect_info->num_of_ics) > + { > + edge->indirect_info->num_of_ics--; > + if (edge->indirect_info->num_of_ics == 0) > + edge->speculative = false; > + } > + else > + edge->speculative = false; > e2->speculative = false; > ref->remove_reference (); > if (e2->indirect_unknown_callee || e2->inline_failed) This function should turn speculative call into direct call to DECL, so I think it should remove all the other direct calls associated with stmt and the indirect one. There are now two cases - in first case you want to turn speculative call into direct call or give up on especulation completely, while in other case you want to only remove one of speculations. I guess we want to have resolve_speculation(decl) for first and remove_one_speculation(edge) for the second case? The second case would be useful for the code below handling type mismatches and also for inline when one of speculative targets seems not useful to bother with. > @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void) > e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt, > false); > e->count = gimple_bb (e->call_stmt)->count; > - e2->speculative = false; > + if (e2->indirect_info && e2->indirect_info->num_of_ics) > + { > + e2->indirect_info->num_of_ics--; > + if (e2->indirect_info->num_of_ics == 0) > + e2->speculative = false; > + } > + else > + e2->speculative = false; > e2->count = gimple_bb (e2->call_stmt)->count; > ref->speculative = false; > ref->stmt = NULL; > extern void debuginfo_early_init (void); > extern void debuginfo_init (void); > @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info > int param_index; > /* ECF flags determined from the caller. */ > int ecf_flags; > - /* Profile_id of common target obtrained from profile. */ > + /* Profile_id of common target obtained from profile. */ > int common_target_id; > /* Probability that call will land in function with COMMON_TARGET_ID. */ > int common_target_probability; > > + /* Profile_id of common target obtained from profile. */ > + int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2]; > + /* Probabilities that call will land in function with COMMON_TARGET_IDS. > */ > + int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2]; I would use vec of pairs (profile_id,probability) to hold this and do not wire in GCOV_ICALL_TOPN_NCOUTS. Most of time this vec will be just NULL pointer so it will result in less memory overhead and will avoid hard limit on number of speculations we want to do. Note that the speculative edges may end up being redirected during IPA optimization, for example when their target is cloned for particular call context or when the function is detected identical to other function. So one can not preserve the mapping between targets and profile ids. Also this infrastructure is useful even w/o profile because we could use ipa-devirt to devirtualize even when multiple polymorphic targets are found. So I would not wirte in the limit GCOV_ICALL_TOPN_NCOUNTS and just use dynamically allocated vectors instead. With ipa-devirt it is possible that we know that there are precisely 2 possible polymorphic targets. Other case i was considering was to speculatively inline with -fPIC. I.e. when one has interposiable call for foo() one can create foo.localalias() for the definition visible to compiler and then speculate if (foo == foo.localalias()) inline_path else foo(); In these cases we may end up with indirect call that has no associated indirect edge (which we do not support currently), so it may be interesting to move speculative call info away from indirect call info to cgraph edge structure (but that can be done incrementally based on what you do now - this code is not completely easy so lets do it step by step). > @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void) > gimple_remove_histogram_value (DECL_STRUCT_FUNCTION > (node->decl), > stmt, h); > } > + else if (h && type == HIST_TYPE_INDIR_CALL_TOPN) > + { > + unsigned j; > + struct cgraph_edge *e = node->get_edge (stmt); > + if (e && !e->indirect_unknown_callee) > + continue; I suppose you are going to change this for Martin's implementation, so i have skipped it for now. > @@ -558,7 +601,8 @@ ipa_profile (void) > { > if (n->count.initialized_p ()) > nindirect++; > - if (e->indirect_info->common_target_id) > + if (e->indirect_info->common_target_id > + || (e->indirect_info && e->indirect_info->num_of_ics == 1)) > { > if (!node_map_initialized) > init_node_map (false); > @@ -613,7 +657,7 @@ ipa_profile (void) > if (dump_file) > fprintf (dump_file, > "Not speculating: " > - "parameter count mistmatch\n"); > + "parameter count mismatch\n"); > } > else if (e->indirect_info->polymorphic > && !opt_for_fn (n->decl, flag_devirtualize) > @@ -655,7 +699,130 @@ ipa_profile (void) > nunknown++; > } > } > - } > + if (e->indirect_info && e->indirect_info->num_of_ics > 1) > + { > + if (in_lto_p) > + { > + if (dump_file) > + { > + fprintf (dump_file, > + "Updating hotness threshold in LTO mode.\n"); > + fprintf (dump_file, "Updated min count: %" PRId64 "\n", > + (int64_t) threshold); > + } > + set_hot_bb_threshold (threshold > + / e->indirect_info->num_of_ics); > + } Why do you need different paths for one target and for multiple targets? Also why LTO is different here? > + if (!node_map_initialized) > + init_node_map (false); > + node_map_initialized = true; > + ncommon++; > + unsigned speculative = 0; > + for (i = 0; i < (int)e->indirect_info->num_of_ics; i++) > + { > + n2 = find_func_by_profile_id ( > + e->indirect_info->common_target_ids[i]); > + if (n2) > + { > + if (dump_file) > + { > + fprintf ( > + dump_file, > + "Indirect call -> direct call from" > + " other module %s => %s, prob %3.2f\n", > + n->dump_name (), n2->dump_name (), > + e->indirect_info->common_target_probabilities[i] > + / (float) REG_BR_PROB_BASE); > + } > + if (e->indirect_info->common_target_probabilities[i] > + < REG_BR_PROB_BASE / 2) > + { > + nuseless++; > + if (dump_file) > + fprintf ( > + dump_file, > + "Not speculating: probability is too low.\n"); > + } > + else if (!e->maybe_hot_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: call is cold.\n"); > + } > + else if (n2->get_availability () <= AVAIL_INTERPOSABLE > + && n2->can_be_discarded_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: target is overwritable " > + "and can be discarded.\n"); > + } > + else if (ipa_node_params_sum && ipa_edge_args_sum > + && (!vec_safe_is_empty ( > + IPA_NODE_REF (n2)->descriptors)) > + && ipa_get_param_count (IPA_NODE_REF (n2)) > + != ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + && (ipa_get_param_count (IPA_NODE_REF (n2)) > + >= ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + || !stdarg_p (TREE_TYPE (n2->decl)))) > + { > + nmismatch++; > + if (dump_file) > + fprintf (dump_file, "Not speculating: " > + "parameter count mismatch\n"); > + } > + else if (e->indirect_info->polymorphic > + && !opt_for_fn (n->decl, flag_devirtualize) > + && !possible_polymorphic_call_target_p (e, n2)) > + { > + nimpossible++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: " > + "function is not in the polymorphic " > + "call target list\n"); > + } > + else > + { > + /* Target may be overwritable, but profile says that > + control flow goes to this particular implementation > + of N2. Speculate on the local alias to allow > + inlining. > + */ > + if (!n2->can_be_discarded_p ()) > + { > + cgraph_node *alias; > + alias = dyn_cast<cgraph_node *> ( > + n2->noninterposable_alias ()); > + if (alias) > + n2 = alias; > + } > + nconverted++; > + e->make_speculative ( > + n2, e->count.apply_probability ( > + e->indirect_info > + ->common_target_probabilities[i])); > + update = true; > + speculative++; > + } > + } > + else > + { > + if (dump_file) > + fprintf (dump_file, > + "Function with profile-id %i not found.\n", > + e->indirect_info->common_target_ids[i]); > + nunknown++; > + } > + } > + if (speculative < e->indirect_info->num_of_ics) > + e->indirect_info->num_of_ics = speculative; > + } > + } > if (update) > ipa_update_overall_fn_summary (n); > } > diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c > index 79b250c3943..30347691029 100644 > --- a/gcc/ipa-utils.c > +++ b/gcc/ipa-utils.c > @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst, > update_max_bb_count (); > compute_function_frequency (); > pop_cfun (); > + /* When src is speculative, clone the referrings. */ > + if (src->indirect_call_target) > + for (e = src->callers; e; e = e->next_caller) > + if (e->callee == src && e->speculative) > + dst->clone_referring (src); This looks wrong. Why do you need to copy all references from src to target? > + /* Speculative calls consist of two edges - direct > + and indirect. Duplicate the whole thing and > + distribute frequencies accordingly. */ > + if (edge->speculative) I think here you want to handle the whole group with multiple targets. > + { > + struct ipa_ref *ref; > + > + gcc_assert (!edge->indirect_unknown_callee); > + old_edge->speculative_call_info (direct, > + indirect, ref); > + > + profile_count indir_cnt = indirect->count; > + indirect > + = indirect->clone (id->dst_node, call_stmt, > + gimple_uid (stmt), num, > + den, true); > + > + profile_probability prob > + = indir_cnt.probability_in (old_cnt > + + indir_cnt); > + indirect->count > + = copy_basic_block->count.apply_probability ( > + prob); > + edge->count > + = copy_basic_block->count - indirect->count; > + id->dst_node->clone_reference (ref, stmt); > + } > + else > + edge->count = copy_basic_block->count; > + } > + /* If the indirect call contains more than one indirect > + targets, need clone all speculative edges here. */ > + if (old_edge && old_edge->next_callee > + && old_edge->speculative && indirect > + && indirect->indirect_info > + && indirect->indirect_info->num_of_ics > 1) > + { > + edge = old_edge->next_callee; > + old_edge = old_edge->next_callee; > + if (edge->speculative) > + next_speculative = true; > + } > + } > + while (next_speculative); > + } > break; > > case CB_CGE_MOVE_CLONES: