Hi, Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available while it clones without feedback.
Consider: __attribute__ ((used)) int a[1000]; __attribute__ ((noinline)) void test2(int sz) { for (int i = 0; i < sz; i++) a[i]++; asm volatile (""::"m"(a)); } __attribute__ ((noinline)) void test1 (int sz) { for (int i = 0; i < 1000; i++) test2(sz); } int main() { test1(1000); return 0; } Here we want to clone call both test1 and test2 and specialize for 1000, but ipa-cp will not do that, since it will skip call main->test1 as not hot since it is called just once both with or without profile feedback. In this simple testcase even without profile feedback we will track that main is called once. I think the testcase shows that hotness of call is not that relevant when deciding whether we want to propagate constants across it. ipa-cp with IPA profile can compute overall estimate of time saved (which is existing time benefit computing time saved per invociation of the function multiplied by number of executions) and see if result is big enough. An easy check is to simply call maybe_hot_p on the resulting count. So this patch makes ipa-cp to consider all calls sites except those known to be unlikely executed (i.e. run 0 times in train run or known to lead to someting bad) as interesting, which makes ipa-cp to propagate across them, find cloning candidates and feed them into good_clonning_oppurtunity. For this I added cs_interesting_for_ipcp_p which also attempts to do right thing with partial training. Now good_clonning_oppurtunity will currently return false, since it will figure out that the call edge is not very frequent. It already kind of knows that frequency of call instruction istself is not too important, but instead of computing overall time saved, it tries to compare it with param_ipa_cp_profile_count_base percentage of counts of call edges. I think this is not very relevant since estimated time saved per call can be large. So I dropped this logic and replaced it with simple use of overall saved time. Since ipa-cp is not dealing well with the cases where it hits the allowed unit growth limit, we probably want to be more careful, so I keep existing metric with this change. So now we get: Evaluating opportunities for test1/3. - considering value 1000 for param #0 sz (caller_count: 1) good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500 not cloning: time saved is not hot good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500 First call to good_cloning_oppurtunity considers the case where only test1 is clonned. In this case time saved is 1 (for passing the value around) and since it is called just once (count_sum) overall time saved is 1 which is not considered hot and we also get very low evaulation score. In the second call we consider cloning chain test1->test2. In this case time saved is large (12901) since test2 is invoked many times and it is used to controll the loop. We still know that the count is 1 but overall time is 129001 which is already considered relevant and we clone. I also try to do something sensible in case we have calls both with and without IPA profile (which can happen for comdats where profile got missing or with LTO if some units were not trained). Instead of checking whether sum of calls with known profile is nonzero, I keep track if there are other calls and if so, also try the local heuristics that is used without profile feedback. The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%). We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange). The main problem I see is that ipa-cp has the global limit for growth of 10% but does not consider the oppurtunities in priority order. Consequently if the limit is hit, randomly some clone oppurtunities are dropped in favour of others. I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get: orig new growth 588677 605385 102.838229 4378 6037 137.894016 484650 494851 102.104818 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 21597 100.972462 24925 26746 107.305918 15308 23974 156.610922 27354 27906 102.017986 494 494 100.000000 4631 4631 100.000000 863216 872729 101.102042 126604 126604 100.000000 605138 627156 103.638509 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 Small units can grow up to 16000 instructions and other units are large. So there is only one 156% growth hititng limits which is exchange that has recursive clonning that goes specially. With profile feedback ipacp basically shuts itself off: 333815 333891 100.022767 2559 2974 116.217272 217576 217581 100.002298 2749 2749 100.000000 64652 64716 100.098992 68416 69707 101.886986 13171 13171 100.000000 11849 11849 100.000000 10519 16180 153.816903 15843 15843 100.000000 231 231 100.000000 3624 3624 100.000000 573385 573386 100.000174 97623 97623 100.000000 295673 295676 100.001015 2750 2750 100.000000 130723 130726 100.002295 2334 2334 100.000000 19313 19313 100.000000 2749 2749 100.000000 517331 517331 100.000000 6707 6707 100.000000 2749 2749 100.000000 193638 193638 100.000000 16425 16425 100.000000 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So we essentially clone only exchange and and mcf (116%) With patch and no FDO I get: 588677 605385 102.838229 4378 6037 137.894016 484519 494698 102.100846 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 22632 105.811398 24854 26620 107.105496 15308 23974 156.610922 27354 28039 102.504204 494 494 100.000000 4631 4631 100.000000 4631 4631 100.000000 126604 126630 100.020536 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 2760715 2835539 102.710312 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 which seems essentially same as without patch. However with FDO I get: 333815 350363 104.957237 2559 3345 130.715123 217469 220765 101.515618 485599 488772 100.653420 2749 2749 100.000000 64652 74265 114.868836 68416 87484 127.870674 13171 20656 156.829398 11792 11990 101.679104 10519 17028 161.878506 15843 16119 101.742094 231 231 100.000000 573336 573336 100.000000 97623 97623 100.000000 295497 296208 100.240612 2750 2750 100.000000 130723 133341 102.002708 2334 2334 100.000000 19313 19368 100.284782 2749 2749 100.000000 6707 6755 100.715670 2749 2749 100.000000 193638 194712 100.554643 16425 17377 105.796043 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So here we get 114% and 127 growth in x264 (two differen tbinaries) 56% growht in Deepsjeng, 61% growth in Exchange which all are above 10% cutoff. Bootstrapped/regtested x86_64-linux. gcc/ChangeLog: * ipa-cp.cc (base_count): Remove. (struct caller_statistics): Rename n_hot_calls to n_interesting_calls; add called_without_ipa_profile. (init_caller_stats): Update. (cs_interesting_for_ipcp_p): New function. (gather_caller_stats): collect n_interesting_calls and called_without_profile. (ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot. (good_cloning_opportunity_p): Rewrite heuristics when IPA profile is present (estimate_local_effects): Update. (value_topo_info::propagate_effects): Update. (compare_edge_profile_counts): Remove. (ipcp_propagate_stage): Do not collect base_count. (get_info_about_necessary_edges): Record whether function is called without profile. (decide_about_value): Update. (ipa_cp_cc_finalize): Do not initialie base_count. * profile-count.cc (profile_count::operator*): New. (profile_count::operator*=): New. * profile-count.h (profile_count::operator*): Declare (profile_count::operator*=): Declare. * params.opt: Remove ipa-cp-profile-count-base. * doc/invoke.texi: Likewise. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4fbb4cda101..1ef24215002 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16795,11 +16795,6 @@ Maximum depth of recursive cloning for self-recursive function. Recursive cloning only when the probability of call being executed exceeds the parameter. -@item ipa-cp-profile-count-base -When using @option{-fprofile-use} option, IPA-CP will consider the measured -execution count of a call graph edge at this percentage position in their -histogram as the basis for its heuristics calculation. - @item ipa-cp-recursive-freq-factor The number of times interprocedural copy propagation expects recursive functions to call themselves. diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc index 2347a5fcefc..a276e22d056 100644 --- a/gcc/ipa-cp.cc +++ b/gcc/ipa-cp.cc @@ -147,10 +147,6 @@ object_allocator<ipcp_value_source<tree> > ipcp_sources_pool object_allocator<ipcp_agg_lattice> ipcp_agg_lattice_pool ("IPA_CP aggregate lattices"); -/* Base count to use in heuristics when using profile feedback. */ - -static profile_count base_count; - /* Original overall size of the program. */ static long overall_size, orig_overall_size; @@ -488,14 +484,16 @@ struct caller_statistics profile_count count_sum; /* Sum of all frequencies for all calls. */ sreal freq_sum; - /* Number of calls and hot calls respectively. */ - int n_calls, n_hot_calls; + /* Number of calls and calls considered interesting respectively. */ + int n_calls, n_interesting_calls; /* If itself is set up, also count the number of non-self-recursive calls. */ int n_nonrec_calls; /* If non-NULL, this is the node itself and calls from it should have their counts included in rec_count_sum and not count_sum. */ cgraph_node *itself; + /* True if there is a caller that has no IPA profile. */ + bool called_without_ipa_profile; }; /* Initialize fields of STAT to zeroes and optionally set it up so that edges @@ -507,10 +505,39 @@ init_caller_stats (caller_statistics *stats, cgraph_node *itself = NULL) stats->rec_count_sum = profile_count::zero (); stats->count_sum = profile_count::zero (); stats->n_calls = 0; - stats->n_hot_calls = 0; + stats->n_interesting_calls = 0; stats->n_nonrec_calls = 0; stats->freq_sum = 0; stats->itself = itself; + stats->called_without_ipa_profile = false; +} + +/* We want to propagate across edges that may be executed, however + we do not want to check maybe_hot, since call itself may be cold + while calee contains some heavy loop which makes propagation still + relevant. + + In particular, even edge called once may lead to significant + improvement. */ + +static bool +cs_interesting_for_ipcp_p (cgraph_edge *e) +{ + /* If profile says the edge is executed, we want to optimize. */ + if (e->count.ipa ().nonzero_p ()) + return true; + /* If local (possibly guseed or adjusted 0 profile) claims edge is + not executed, do not propagate. */ + if (!e->count.nonzero_p ()) + return false; + /* If IPA profile says edge is executed zero times, but zero + is quality is ADJUSTED, still consider it for cloning in + case we have partial training. */ + if (e->count.ipa ().initialized_p () + && opt_for_fn (e->callee->decl,flag_profile_partial_training) + && e->count.nonzero_p ()) + return false; + return true; } /* Worker callback of cgraph_for_node_and_aliases accumulating statistics of @@ -536,13 +563,18 @@ gather_caller_stats (struct cgraph_node *node, void *data) else stats->count_sum += cs->count.ipa (); } + else + stats->called_without_ipa_profile = true; stats->freq_sum += cs->sreal_frequency (); stats->n_calls++; if (stats->itself && stats->itself != cs->caller) stats->n_nonrec_calls++; - if (cs->maybe_hot_p ()) - stats->n_hot_calls ++; + /* If profile known to be zero, we do not want to clone for performance. + However if call is cold, the called function may still contain + important hot loops. */ + if (cs_interesting_for_ipcp_p (cs)) + stats->n_interesting_calls++; } return false; @@ -585,26 +617,11 @@ ipcp_cloning_candidate_p (struct cgraph_node *node) node->dump_name ()); return true; } - - /* When profile is available and function is hot, propagate into it even if - calls seems cold; constant propagation can improve function's speed - significantly. */ - if (stats.count_sum > profile_count::zero () - && node->count.ipa ().initialized_p ()) - { - if (stats.count_sum > node->count.ipa ().apply_scale (90, 100)) - { - if (dump_file) - fprintf (dump_file, "Considering %s for cloning; " - "usually called directly.\n", - node->dump_name ()); - return true; - } - } - if (!stats.n_hot_calls) + if (!stats.n_interesting_calls) { if (dump_file) - fprintf (dump_file, "Not considering %s for cloning; no hot calls.\n", + fprintf (dump_file, "Not considering %s for cloning; " + "no calls considered interesting by profile.\n", node->dump_name ()); return false; } @@ -3361,24 +3378,29 @@ incorporate_penalties (cgraph_node *node, ipa_node_params *info, static bool good_cloning_opportunity_p (struct cgraph_node *node, sreal time_benefit, sreal freq_sum, profile_count count_sum, - int size_cost) + int size_cost, bool called_without_ipa_profile) { + gcc_assert (count_sum.ipa () == count_sum); if (time_benefit == 0 || !opt_for_fn (node->decl, flag_ipa_cp_clone) - || node->optimize_for_size_p ()) + || node->optimize_for_size_p () + /* If there is no call which was executed in profiling or where + profile is missing, we do not want to clone. */ + || (!called_without_ipa_profile && !count_sum.nonzero_p ())) return false; gcc_assert (size_cost > 0); ipa_node_params *info = ipa_node_params_sum->get (node); int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold); + /* If we know the execution IPA execution counts, we can estimate overall + speedup of the program. */ if (count_sum.nonzero_p ()) { - gcc_assert (base_count.nonzero_p ()); - sreal factor = count_sum.probability_in (base_count).to_sreal (); - sreal evaluation = (time_benefit * factor) / size_cost; + profile_count saved_time = count_sum * time_benefit; + sreal evaluation = saved_time.to_sreal_scale (profile_count::one ()) + / size_cost; evaluation = incorporate_penalties (node, info, evaluation); - evaluation *= 1000; if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -3386,33 +3408,46 @@ good_cloning_opportunity_p (struct cgraph_node *node, sreal time_benefit, "size: %i, count_sum: ", time_benefit.to_double (), size_cost); count_sum.dump (dump_file); + fprintf (dump_file, ", overall time saved: "); + saved_time.dump (dump_file); fprintf (dump_file, "%s%s) -> evaluation: %.2f, threshold: %i\n", info->node_within_scc ? (info->node_is_self_scc ? ", self_scc" : ", scc") : "", info->node_calling_single_call ? ", single_call" : "", evaluation.to_double (), eval_threshold); } - - return evaluation.to_int () >= eval_threshold; + gcc_checking_assert (saved_time == saved_time.ipa ()); + if (!maybe_hot_count_p (NULL, saved_time)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " not cloning: time saved is not hot\n"); + } + /* Evaulation approximately corresponds to time saved per instruction + introduced. This is likely almost always going to be true, since we + already checked that time saved is large enough to be considered + hot. */ + else if (evaluation.to_int () >= eval_threshold) + return true; + /* If all call sites have profile known; we know we do not want t clone. + If there are calls with unknown profile; try local heuristics. */ + if (!called_without_ipa_profile) + return false; } - else - { - sreal evaluation = (time_benefit * freq_sum) / size_cost; - evaluation = incorporate_penalties (node, info, evaluation); - evaluation *= 1000; + sreal evaluation = (time_benefit * freq_sum) / size_cost; + evaluation = incorporate_penalties (node, info, evaluation); + evaluation *= 1000; - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " good_cloning_opportunity_p (time: %g, " - "size: %i, freq_sum: %g%s%s) -> evaluation: %.2f, " - "threshold: %i\n", - time_benefit.to_double (), size_cost, freq_sum.to_double (), - info->node_within_scc - ? (info->node_is_self_scc ? ", self_scc" : ", scc") : "", - info->node_calling_single_call ? ", single_call" : "", - evaluation.to_double (), eval_threshold); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " good_cloning_opportunity_p (time: %g, " + "size: %i, freq_sum: %g%s%s) -> evaluation: %.2f, " + "threshold: %i\n", + time_benefit.to_double (), size_cost, freq_sum.to_double (), + info->node_within_scc + ? (info->node_is_self_scc ? ", self_scc" : ", scc") : "", + info->node_calling_single_call ? ", single_call" : "", + evaluation.to_double (), eval_threshold); - return evaluation.to_int () >= eval_threshold; - } + return evaluation.to_int () >= eval_threshold; } /* Grow vectors in AVALS and fill them with information about values of @@ -3605,7 +3640,8 @@ estimate_local_effects (struct cgraph_node *node) "known contexts, code not going to grow.\n"); } else if (good_cloning_opportunity_p (node, time, stats.freq_sum, - stats.count_sum, size)) + stats.count_sum, size, + stats.called_without_ipa_profile)) { if (size + overall_size <= get_max_overall_size (node)) { @@ -3971,7 +4007,7 @@ value_topo_info<valtype>::propagate_effects () processed_srcvals.empty (); for (src = val->sources; src; src = src->next) if (src->val - && src->cs->maybe_hot_p ()) + && cs_interesting_for_ipcp_p (src->cs)) { if (!processed_srcvals.add (src->val)) { @@ -4016,21 +4052,6 @@ value_topo_info<valtype>::propagate_effects () } } -/* Callback for qsort to sort counts of all edges. */ - -static int -compare_edge_profile_counts (const void *a, const void *b) -{ - const profile_count *cnt1 = (const profile_count *) a; - const profile_count *cnt2 = (const profile_count *) b; - - if (*cnt1 < *cnt2) - return 1; - if (*cnt1 > *cnt2) - return -1; - return 0; -} - /* Propagate constants, polymorphic contexts and their effects from the summaries interprocedurally. */ @@ -4043,10 +4064,6 @@ ipcp_propagate_stage (class ipa_topo_info *topo) if (dump_file) fprintf (dump_file, "\n Propagating constants:\n\n"); - base_count = profile_count::uninitialized (); - - bool compute_count_base = false; - unsigned base_count_pos_percent = 0; FOR_EACH_DEFINED_FUNCTION (node) { if (node->has_gimple_body_p () @@ -4063,57 +4080,8 @@ ipcp_propagate_stage (class ipa_topo_info *topo) ipa_size_summary *s = ipa_size_summaries->get (node); if (node->definition && !node->alias && s != NULL) overall_size += s->self_size; - if (node->count.ipa ().initialized_p ()) - { - compute_count_base = true; - unsigned pos_percent = opt_for_fn (node->decl, - param_ipa_cp_profile_count_base); - base_count_pos_percent = MAX (base_count_pos_percent, pos_percent); - } } - if (compute_count_base) - { - auto_vec<profile_count> all_edge_counts; - all_edge_counts.reserve_exact (symtab->edges_count); - FOR_EACH_DEFINED_FUNCTION (node) - for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee) - { - profile_count count = cs->count.ipa (); - if (!count.nonzero_p ()) - continue; - - enum availability avail; - cgraph_node *tgt - = cs->callee->function_or_virtual_thunk_symbol (&avail); - ipa_node_params *info = ipa_node_params_sum->get (tgt); - if (info && info->versionable) - all_edge_counts.quick_push (count); - } - - if (!all_edge_counts.is_empty ()) - { - gcc_assert (base_count_pos_percent <= 100); - all_edge_counts.qsort (compare_edge_profile_counts); - - unsigned base_count_pos - = ((all_edge_counts.length () * (base_count_pos_percent)) / 100); - base_count = all_edge_counts[base_count_pos]; - - if (dump_file) - { - fprintf (dump_file, "\nSelected base_count from %u edges at " - "position %u, arriving at: ", all_edge_counts.length (), - base_count_pos); - base_count.dump (dump_file); - fprintf (dump_file, "\n"); - } - } - else if (dump_file) - fprintf (dump_file, "\nNo candidates with non-zero call count found, " - "continuing as if without profile feedback.\n"); - } - orig_overall_size = overall_size; if (dump_file) @@ -4375,15 +4343,17 @@ static bool get_info_about_necessary_edges (ipcp_value<valtype> *val, cgraph_node *dest, sreal *freq_sum, int *caller_count, profile_count *rec_count_sum, - profile_count *nonrec_count_sum) + profile_count *nonrec_count_sum, + bool *called_without_ipa_profile) { ipcp_value_source<valtype> *src; sreal freq = 0; int count = 0; profile_count rec_cnt = profile_count::zero (); profile_count nonrec_cnt = profile_count::zero (); - bool hot = false; + bool interesting = false; bool non_self_recursive = false; + *called_without_ipa_profile = false; for (src = val->sources; src; src = src->next) { @@ -4394,15 +4364,19 @@ get_info_about_necessary_edges (ipcp_value<valtype> *val, cgraph_node *dest, { count++; freq += cs->sreal_frequency (); - hot |= cs->maybe_hot_p (); + interesting |= cs_interesting_for_ipcp_p (cs); if (cs->caller != dest) { non_self_recursive = true; if (cs->count.ipa ().initialized_p ()) rec_cnt += cs->count.ipa (); + else + *called_without_ipa_profile = true; } else if (cs->count.ipa ().initialized_p ()) nonrec_cnt += cs->count.ipa (); + else + *called_without_ipa_profile = true; } cs = get_next_cgraph_edge_clone (cs); } @@ -4418,19 +4392,7 @@ get_info_about_necessary_edges (ipcp_value<valtype> *val, cgraph_node *dest, *rec_count_sum = rec_cnt; *nonrec_count_sum = nonrec_cnt; - if (!hot && ipa_node_params_sum->get (dest)->node_within_scc) - { - struct cgraph_edge *cs; - - /* Cold non-SCC source edge could trigger hot recursive execution of - function. Consider the case as hot and rely on following cost model - computation to further select right one. */ - for (cs = dest->callers; cs; cs = cs->next_caller) - if (cs->caller == dest && cs->maybe_hot_p ()) - return true; - } - - return hot; + return interesting; } /* Given a NODE, and a set of its CALLERS, try to adjust order of the callers @@ -5914,6 +5876,7 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, sreal freq_sum; profile_count count_sum, rec_count_sum; vec<cgraph_edge *> callers; + bool called_without_ipa_profile; if (val->spec_node) { @@ -5929,7 +5892,8 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, return false; } else if (!get_info_about_necessary_edges (val, node, &freq_sum, &caller_count, - &rec_count_sum, &count_sum)) + &rec_count_sum, &count_sum, + &called_without_ipa_profile)) return false; if (!dbg_cnt (ipa_cp_values)) @@ -5966,9 +5930,11 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, if (!good_cloning_opportunity_p (node, val->local_time_benefit, freq_sum, count_sum, - val->local_size_cost) + val->local_size_cost, + called_without_ipa_profile) && !good_cloning_opportunity_p (node, val->prop_time_benefit, - freq_sum, count_sum, val->prop_size_cost)) + freq_sum, count_sum, val->prop_size_cost, + called_without_ipa_profile)) return false; if (dump_file) @@ -6550,7 +6516,6 @@ make_pass_ipa_cp (gcc::context *ctxt) void ipa_cp_cc_finalize (void) { - base_count = profile_count::uninitialized (); overall_size = 0; orig_overall_size = 0; ipcp_free_transformation_sum (); diff --git a/gcc/params.opt b/gcc/params.opt index 4f4eb4d7a2a..6f5d940dc95 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -273,10 +273,6 @@ The size of translation unit that IPA-CP pass considers large. Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param Optimization Maximum size of a list of values associated with each parameter for interprocedural constant propagation. --param=ipa-cp-profile-count-base= -Common Joined UInteger Var(param_ipa_cp_profile_count_base) Init(10) IntegerRange(0, 100) Param Optimization -When using profile feedback, use the edge at this percentage position in frequency histogram as the bases for IPA-CP heuristics. - -param=ipa-jump-function-lookups= Common Joined UInteger Var(param_ipa_jump_function_lookups) Init(8) Param Optimization Maximum number of statements visited during jump function offset discovery. diff --git a/gcc/profile-count.cc b/gcc/profile-count.cc index 8b9d8e18c51..374f06f4c08 100644 --- a/gcc/profile-count.cc +++ b/gcc/profile-count.cc @@ -519,3 +519,26 @@ profile_probability::pow (int n) const } return ret; } +profile_count +profile_count::operator* (const sreal &num) const +{ + if (m_val == 0) + return *this; + if (!initialized_p ()) + return uninitialized (); + sreal scaled = num * m_val; + gcc_checking_assert (scaled >= 0); + profile_count ret; + if (m_val > max_count) + ret.m_val = max_count; + else + ret.m_val = scaled.to_nearest_int (); + ret.m_quality = MIN (m_quality, ADJUSTED); + return ret; +} + +profile_count +profile_count::operator*= (const sreal &num) +{ + return *this * num; +} diff --git a/gcc/profile-count.h b/gcc/profile-count.h index 015aee981ca..0e79fd241b5 100644 --- a/gcc/profile-count.h +++ b/gcc/profile-count.h @@ -1061,6 +1061,9 @@ public: return *this; } + profile_count operator* (const sreal &num) const; + profile_count operator*= (const sreal &num); + profile_count operator/ (int64_t den) const { return apply_scale (1, den); diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-clone-4.c b/gcc/testsuite/gcc.dg/ipa/ipa-clone-4.c new file mode 100644 index 00000000000..7c7b27e3829 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/ipa-clone-4.c @@ -0,0 +1,30 @@ +/* { dg-options "-O3 -fdump-ipa-cp" } */ +__attribute__ ((used)) +int a[1000]; + +__attribute__ ((noinline)) +void +test2(int sz) +{ + for (int i = 0; i < sz; i++) + a[i]++; + asm volatile (""::"m"(a)); +} + +__attribute__ ((noinline)) +void +test1 (int sz) +{ + for (int i = 0; i < 1000; i++) + test2(sz); +} +int main() +{ + test1(1000); + return 0; +} +/* We should clone test1 and test2 for constant 1000. + In the past we did not do this since we did not clone for edges that are not hot + and call main->test1 is not considered hot since it is executed just once. */ +/* { dg-final-use { scan-ipa-dump-times "Creating a specialized node of void test1" 1 "cp"} } */ +/* { dg-final-use { scan-ipa-dump-times "Creating a specialized node of void test2" 1 "cp"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-prof/ipa-cp-1.c b/gcc/testsuite/gcc.dg/tree-prof/ipa-cp-1.c new file mode 100644 index 00000000000..591eb9c16c4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-prof/ipa-cp-1.c @@ -0,0 +1,30 @@ +/* { dg-options "-O2 -fdump-ipa-cp" } */ +__attribute__ ((used)) +int a[1000]; + +__attribute__ ((noinline)) +void +test2(int sz) +{ + for (int i = 0; i < sz; i++) + a[i]++; + asm volatile (""::"m"(a)); +} + +__attribute__ ((noinline)) +void +test1 (int sz) +{ + for (int i = 0; i < 1000; i++) + test2(sz); +} +int main() +{ + test1(1000); + return 0; +} +/* We should clone test1 and test2 for constant 1000. + In the past we did not do this since we did not clone for edges that are not hot + and call main->test1 is not considered hot since it is executed just once. */ +/* { dg-final-use { scan-ipa-dump-times "Creating a specialized node of void test1" 1 "cp"} } */ +/* { dg-final-use { scan-ipa-dump-times "Creating a specialized node of void test2" 1 "cp"} } */