Ping.
On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman <era...@google.com> wrote: > I have attached a revised patch. The updated ChangeLog is given below > and I have responded to your comments inline: > > 2012-10-08 Easwaran Raman <era...@google.com> > * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to > specificy the probability of taking the jump. > (emit_cmp_and_jump_insns): Likewise. > (expand_compare_and_swap_loop): Make the jump predicted not taken. > * dojump.c (do_compare_rtx_and_jump): Remove the code attaching > REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns. > * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing > probabilities for branches with more than two successors. > * expr.c (emit_block_move_via_loop): Predict the loop backedge loop > to be highly taken. > (try_casesi): Pass the probability of jumping to the default label. > (try_tablejump): Likewise. > (do_tablejump): Likewise. > * expr.h (try_tablejump): Add a new parameter. > (try_casesi): Likewise. > (emit_cmp_and_jump_insns): Add probability as default parameter with a > default value of -1. > * except.c (sjlj_emit_function_enter): Pass probability to > emit_cmp_and_jump_insns. > * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB. > (do_jump_if_equal): Pass probability for REG_BR_PROB note. > (add_case_node): Pass estimated probability of jumping to the case > label. > (emit_case_decision_tree): Pass default_prob to emit_case_nodes. > (get_outgoing_edge_probs): New function. > (conditional_probability): Likewise. > (reset_out_edges_aux): Likewise. > (compute_cases_per_edge): Likewise. > (emit_case_dispatch_table): Update probabilities of edges coming out > of the switch statement. > (expand_case): Compute and propagate default edge probability to > emit_case_dispatch_table. > (expand_sjlj_dispatch_table): Update calls to add_case_node and > emit_case_dispatch_table. > (balance_case_nodes): Update subtree_prob values. > (emit_case_nodes): Compute edge probabilities and add pass them to > emit_cmp_and_jump_insns. > > gcc/testsuite/ChangeLog: > 2012-10-02 Easwaran Raman <era...@google.com> > * gcc.dg/tree-prof/switch-case-1.c: New test case. > * gcc.dg/tree-prof/switch-case-2.c: New test case. > > > > > On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> Hi, >>> This patch propagates the profile counts during RTL expansion. In >>> many cases, there is no way to determine the exact count of an edge >>> generated during the expansion. So this patch uses some simple >>> heuristics to estimate the edge counts but ensures that the counts of >>> the basic blocks corresponding to the cases are (nearly the) same as >>> at the gimple level. >>> >>> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for >>> trunk? >>> Index: gcc/expr.c >>> =================================================================== >>> --- gcc/expr.c (revision 191879) >>> +++ gcc/expr.c (working copy) >>> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin >>> #ifdef PUSH_ROUNDING >>> static void emit_single_push_insn (enum machine_mode, rtx, tree); >>> #endif >>> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); >>> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); >>> static rtx const_vector_from_tree (tree); >>> static void write_complex_part (rtx, rtx, bool); >>> >>> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree >>> >>> static void >>> do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx >>> table_label, >>> - rtx default_label) >>> + rtx default_label, int default_probability) >> >> Please document default_probability. > Done. > >>> { >>> rtx temp, vector; >>> >>> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r >>> the maximum value of the range. */ >>> >>> if (default_label) >>> - emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> - default_label); >>> + { >>> + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, >>> + default_label); >>> + if (default_probability != -1) >>> + { >>> + rtx jump_insn = get_last_insn(); >>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT >>> (default_probability)); >>> + } >>> + } >> >> dojump already does this kind of logic, but it is bit more cureful: >> emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, >> if_true_label); >> if (prob != -1 && profile_status != PROFILE_ABSENT) >> { >> for (last = NEXT_INSN (last); >> last && NEXT_INSN (last); >> last = NEXT_INSN (last)) >> if (JUMP_P (last)) >> break; >> if (last >> && JUMP_P (last) >> && ! NEXT_INSN (last) >> && any_condjump_p (last)) >> { >> gcc_assert (!find_reg_note (last, REG_BR_PROB, 0)); >> add_reg_note (last, REG_BR_PROB, GEN_INT (prob)); >> } >> } >> >> What about making emit_cmp_and_jump_insns taking the probability argument >> and moving the code above inside? Most of other places need updating to >> propagate probabilities. > > Done. Made probability a default parameter with a default parameter of > -1 and so didn't update all callsites. > >> (compare_and_jump_seq in loop-unswitch probably also can be updated) >>> @@ -10954,7 +10962,7 @@ do_tablejump (rtx index, enum machine_mode mode, r >>> >>> int >>> try_tablejump (tree index_type, tree index_expr, tree minval, tree range, >>> - rtx table_label, rtx default_label) >>> + rtx table_label, rtx default_label, int default_probability) >> >> Simiarly here. >>> Index: gcc/cfgbuild.c >>> =================================================================== >>> --- gcc/cfgbuild.c (revision 191879) >>> +++ gcc/cfgbuild.c (working copy) >>> @@ -533,6 +533,23 @@ find_bb_boundaries (basic_block bb) >>> purge_dead_tablejump_edges (bb, table); >>> } >>> >>> +/* If there is at least one edge in EDGES with a non-zero count, then >>> + compute probabilities based on the existing counts. */ >>> + >>> +static bool >>> +gen_probabilities_from_existing_counts ( VEC(edge,gc) *edges) { >>> + edge e; >>> + edge_iterator ei; >>> + gcov_type count_sum = 0; >>> + FOR_EACH_EDGE(e, ei, edges) >>> + count_sum += e->count; >>> + if (count_sum == 0) >>> + return false; >>> + FOR_EACH_EDGE(e, ei, edges) >>> + e->probability = e->count * REG_BR_PROB_BASE / count_sum; >>> + return true; >>> +} >>> + >>> /* Assume that frequency of basic block B is known. Compute frequencies >>> and probabilities of outgoing edges. */ >>> >>> @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b) >>> return; >>> } >>> } >>> - >>> if (single_succ_p (b)) >>> { >>> e = single_succ_edge (b); >>> @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b) >>> e->count = b->count; >>> return; >>> } >>> - guess_outgoing_edge_probabilities (b); >>> + else if (!gen_probabilities_from_existing_counts (b->succs)){ >>> + /* All outgoing edges of B have zero count. Guess probabilities. */ >>> + guess_outgoing_edge_probabilities (b); >>> + } >> >> Hmm, I do not quite follow logic here. >> basic block B is one of many basic blocks that the original BB was split >> from. >> It is possible that B may have some of original edges, but there may be new >> ones. >> How you can guess the outgoing probabilitie shere. Do you have an example? > I have changed this code. If the bb has >2 edges, I just use the > probability as it is without guessing. This should work for switch > statements since the edge probabilities are adjusted in stmt.c. I am > not sure if there are other cases of multi-way branches that I need to > worry about here. > >> Also gen_probabilities_from_existing_counts could probably also work based >> on original edge frequencies. >>> @@ -1664,7 +1668,7 @@ do_jump_if_equal (enum machine_mode mode, rtx op0, >>> >>> static struct case_node * >>> add_case_node (struct case_node *head, tree low, tree high, >>> - tree label, alloc_pool case_node_pool) >>> + tree label, gcov_type count, alloc_pool case_node_pool) >> >> Please document COUNT here. > Done. > >>> { >>> struct case_node *r; >>> >>> @@ -1677,6 +1681,8 @@ add_case_node (struct case_node *head, tree low, t >>> r->high = high; >>> r->code_label = label; >>> r->parent = r->left = NULL; >>> + r->count = count; >>> + r->subtree_count = count; >>> r->right = head; >>> return r; >>> } >> >> .. and here > Done. >>> @@ -1803,7 +1809,8 @@ expand_switch_as_decision_tree_p (tree range, >>> >>> static void >>> emit_case_decision_tree (tree index_expr, tree index_type, >>> - struct case_node *case_list, rtx default_label) >>> + struct case_node *case_list, rtx default_label, >>> + gcov_type default_count) >>> { >>> rtx index = expand_normal (index_expr); >>> >>> @@ -1839,15 +1846,29 @@ emit_case_decision_tree (tree index_expr, tree ind >>> dump_case_nodes (dump_file, case_list, indent_step, 0); >>> } >>> >>> - emit_case_nodes (index, case_list, default_label, index_type); >>> + emit_case_nodes (index, case_list, default_label, default_count, >>> index_type); >>> if (default_label) >>> emit_jump (default_label); >>> } >>> >> >> And document functio nhere. > Done. >> >>> +static gcov_type >>> +get_outgoing_edge_counts (basic_block bb) >>> +{ >>> + edge e; >>> + edge_iterator ei; >>> + gcov_type count_sum = 0; >>> + FOR_EACH_EDGE(e, ei, bb->succs) >>> + count_sum += e->count; >>> + return count_sum; >>> +} >>> + >>> +#define case_probability(x, y) \ >>> + ((y) ? (gcc_assert (x <= y), (x) * REG_BR_PROB_BASE / (y)) : -1) >> >> You want to use RDIV here for better rounding. I would make this an inline >> functions these days. > Done. > >>> + >>> /* Generate a dispatch tabler, switching on INDEX_EXPR and jumping to >>> one of the labels in CASE_LIST or to the DEFAULT_LABEL. >>> MINVAL, MAXVAL, and RANGE are the extrema and range of the case >>> - labels in CASE_LIST. >>> + labels in CASE_LIST. STMT_BB is the basic block containing the >>> statement. >>> >>> First, a jump insn is emitted. First we try "casesi". If that >>> fails, try "tablejump". A target *must* have one of them (or both). >>> @@ -1860,19 +1881,23 @@ emit_case_decision_tree (tree index_expr, tree ind >>> static void >>> emit_case_dispatch_table (tree index_expr, tree index_type, >>> struct case_node *case_list, rtx default_label, >>> - tree minval, tree maxval, tree range) >>> + tree minval, tree maxval, tree range, >>> + basic_block stmt_bb) >>> { >>> int i, ncases; >>> struct case_node *n; >>> rtx *labelvec; >>> rtx fallback_label = label_rtx (case_list->code_label); >>> rtx table_label = gen_label_rtx (); >>> + bool has_gaps = false; >>> + edge default_edge = EDGE_SUCC(stmt_bb, 0); >>> + gcov_type default_count = default_edge->count; >>> + gcov_type count = get_outgoing_edge_counts (stmt_bb); >>> + bool try_with_tablejump = false; >> >> Here, I think you want to first decide whether you base expansion on counts >> (i.e. any counts involved is non-0) or probabilities. We really want to >> maintain >> the profile up-to-date even with guessed profile. The inconsistencies >> propagate across CFG and easilly confuse later optimizers. >> Everywhere you are using count comming from the original statement, you can >> also use >> the probability. > > I have changed the code to use probabilities. > >> The newly produced control flow will be optimized and the edges frequencies >> will >> be propagated along. >> >>> + >>> + default_edge->count = default_count; >>> + if (count) >>> + { >>> + edge e; >>> + edge_iterator ei; >>> + FOR_EACH_EDGE (e, ei, stmt_bb->succs) >>> + e->probability = e->count * REG_BR_PROB_BASE / count; >>> + } >> >> Hmm, this updates origina BB containing the switch statement? Of course, >> modulo roundoff errors, this should hold. I wonder where did you got the >> diferences and why do you need this? You are going to output new control >> flow >> and find_many_sub_basic_blocks will recompute all the counts/frequencies >> inside >> anyway? >> Also you want to use RDIV here. > I need to do this to account for the change in the count of default > edge. Changed to use RDIV. > >>> @@ -2358,6 +2433,20 @@ node_is_bounded (case_node_ptr node, tree index_ty >>> && node_has_high_bound (node, index_type)); >>> } >>> >>> + >>> +/* Attach a REG_BR_PROB note to the last created RTX instruction if >>> + PROBABILITY is not -1. */ >>> + >>> +static void >>> +add_prob_note_to_last_insn(int probability) >>> +{ >>> + if (probability != -1) >>> + { >>> + rtx jump_insn = get_last_insn(); >>> + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (probability)); >>> + } >>> +} >> >> Well, I am not sure this is safe thing to do since you do not really >> know what the last instruction is. In any case you want to test >> it is an conditional jump instruction as in the code I quoted >> above. >> >>> @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt >>> unsignedp), >>> GT, NULL_RTX, mode, unsignedp, >>> label_rtx (node->right->code_label)); >>> - emit_case_nodes (index, node->left, default_label, index_type); >>> + probability = case_probability (node->right->count, >>> + subtree_count + >>> default_count); >>> + add_prob_note_to_last_insn (probability); >>> + emit_case_nodes (index, node->left, default_label, default_count, >>> + index_type); >> >> Hmm, here you seem to be distributing the probabilities of the counts >> reached. >> What happens in the case when the edge probability needs to be distributed >> across >> nodes of the decision tree. I.e. in >> t(int a) >> { >> switch (a) >> { >> case 100: >> case 200: >> case 300: >> case 400: >> case 500: >> case 600: >> case 700: >> case 800: >> case 900: >> t(); >> case 101: >> case 202: >> case 303: >> case 404: >> case 505: >> case 606: >> case 707: >> case 808: >> case 909: >> q(); >> } >> } > > I have fixed this. >> >>> } >>> >>> else if (node_is_bounded (node->left, index_type)) >>> @@ -2442,7 +2539,10 @@ emit_case_nodes (rtx index, case_node_ptr node, rt >>> unsignedp), >>> LT, NULL_RTX, mode, unsignedp, >>> label_rtx (node->left->code_label)); >>> - emit_case_nodes (index, node->right, default_label, index_type); >>> + probability = case_probability (node->left->count, >>> + subtree_count + >>> default_count); >>> + add_prob_note_to_last_insn (probability); >> >> Here, and apparently in all uses of add_prob_note_to_last_insn, you want to >> use >> the new parameter of emit_cmp_and_jump_insn. >> >> Otherwise the patch seems to make sense. Thanks for looking into this issue. >> >> Honza