Hi Jeff, I have adapted the code generation part from James' patch to current trunk, and the resulting patch gets the 30% speedup on coremark and passes bootstrap of GCC.
Ok for trunk? Thanks, Sebastian Sebastian Pop wrote: > Sebastian Pop wrote: > > Jeff Law wrote: > > > On 08/21/14 04:30, Richard Biener wrote: > > > >>It turns Jeff's jump-threading code in to a strange franken-pass of > > > >>bits and > > > >>pieces of detection and optimisation, and would need some substantial > > > >>reworking to fit in with Jeff's changes last Autumn, but if it is more > > > >>likely to be acceptable for trunk then perhaps we could look to revive > > > >>it. > > > >>It would be nice to reuse the path copy code Jeff added last year, but I > > > >>don't have much intuition as to how feasible that is. > > > >> > > > >>Was this the sort of thing that you were imagining? > > > > > > > >Yeah, didn't look too closely though. > > > It'd be pretty ugly I suspect. But it's probably worth pondering > > > since that approach would eliminate the concerns about the cost of > > > detection (which is problematical for the jump threader) by using > > > Steve's code for that. > > > > > > On the update side, I suspect most, if not all of the framework is > > > in place to handle this kind of update if the right threading paths > > > were passed to the updater. I can probably cobble together that > > > by-hand and see what the tree-ssa-threadupdate does with it. But > > > it'll be a week or so before I could look at it. > > > > I adapted the patch James has sent last year to use the new update paths > > Attached an updated version of the patch. > > > mechanism. I verified that the attached patch does register all the paths > > that > > need to be threaded. Thread updater seems to have some problems handling > > the > > attached testcase (a simplified version of the testcase attached to the > > bug.) > > > > Jeff, could you please have a look at why the jump-thread updater is > > crashing? > > I have tried to understand why the code generation part ICEs on coremark: the > first problem that I have seen is that tree-ssa-threadupdate.c does not handle > more than a joiner block per path to be threaded, so we would not be able to > jump thread accross the joiners of the if condition and the joiner of the > switch > condition: i.e., these paths > > patch: Registering jump thread: (7, 10) incoming edge; (10, 25) joiner; > (25, 26) joiner; (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) > nocopy; > patch: Registering jump thread: (28, 10) incoming edge; (10, 25) joiner; > (25, 26) joiner; (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 11) > nocopy; > patch: Registering jump thread: (8, 10) incoming edge; (10, 25) joiner; > (25, 26) joiner; (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) > nocopy; > patch: Registering jump thread: (9, 10) incoming edge; (10, 25) joiner; > (25, 26) joiner; (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) > nocopy; > > Another problem is that we attach the path to be threaded to the ->aux field > of > the first edge in the path, such that we would have to cancel some of the > paths > because we cannot keep track of all the paths to be threaded. > > For coremark, we would discover some jump-thread paths from one of the switch > cases over the loop exit condition, either to bb_27 outside the loop, or to > bb_4 > staying inside the loop. Then with the "patch:" we would discover jump > threads > that would thread switch cases to switch cases, and because these paths start > with the same edges for which we have already assigned a path to e->aux, we > would have to cancel the interesting threads added by the patch: > > Registering jump thread: (12, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (13, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (29, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (31, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (16, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (15, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (32, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (19, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (18, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (22, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (21, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (34, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (33, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > Registering jump thread: (35, 25) incoming edge; (25, 26) joiner; (26, > 27) nocopy; > Registering jump thread: (24, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; > patch: Registering jump thread: (12, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > patch: Registering jump thread: (16, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy; > patch: Registering jump thread: (19, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > patch: Registering jump thread: (22, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (34, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (35, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (29, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy; > patch: Registering jump thread: (13, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (15, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > patch: Registering jump thread: (31, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (18, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy; > patch: Registering jump thread: (32, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > patch: Registering jump thread: (21, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy; > patch: Registering jump thread: (33, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy; > patch: Registering jump thread: (24, 25) incoming edge; (25, 26) joiner; > (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy; > Registering jump thread: (6, 36) incoming edge; (36, 7) normal; > Cancelling jump thread: (12, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > Cancelling jump thread: (16, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy; > Cancelling jump thread: (19, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > Cancelling jump thread: (22, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (34, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (35, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (29, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy; > Cancelling jump thread: (13, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (15, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy; > Cancelling jump thread: (31, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (18, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy; > Cancelling jump thread: (32, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy; > Cancelling jump thread: (21, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy; > Cancelling jump thread: (33, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy; > Cancelling jump thread: (24, 25) incoming edge; (25, 26) joiner; (26, 4) > nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy; > > Here is the structure of the CFG with the loops: > > (gdb) p debug_loops (2) > loop_0 (header = 0, latch = 1, niter = ) > { > bb_2 (preds = {bb_0 }, succs = {bb_3 bb_27 }) > bb_3 (preds = {bb_2 }, succs = {bb_5 bb_6 }) > bb_5 (preds = {bb_4 bb_3 }, succs = {bb_27 }) > bb_6 (preds = {bb_3 }, succs = {bb_36 }) > bb_27 (preds = {bb_5 bb_25 bb_26 bb_2 }, succs = {bb_1 }) > loop_1 (header = 36, latch = 37, niter = ) > { > bb_4 (preds = {bb_26 }, succs = {bb_5 bb_37 }) > bb_37 (preds = {bb_4 }, succs = {bb_36 }) > bb_36 (preds = {bb_6 bb_37 }, succs = {bb_25 bb_7 bb_11 bb_20 bb_14 bb_17 > bb_23 bb_24 }) > bb_7 (preds = {bb_36 }, succs = {bb_10 bb_28 }) > bb_8 (preds = {bb_28 }, succs = {bb_10 bb_9 }) > bb_9 (preds = {bb_8 }, succs = {bb_10 }) > bb_10 (preds = {bb_7 bb_28 bb_8 bb_9 }, succs = {bb_25 }) > bb_11 (preds = {bb_36 }, succs = {bb_29 bb_30 }) > bb_12 (preds = {bb_30 }, succs = {bb_25 }) > bb_13 (preds = {bb_30 }, succs = {bb_25 }) > bb_14 (preds = {bb_36 }, succs = {bb_15 bb_16 }) > bb_15 (preds = {bb_14 }, succs = {bb_25 }) > bb_16 (preds = {bb_14 }, succs = {bb_25 bb_31 }) > bb_17 (preds = {bb_36 }, succs = {bb_18 bb_19 }) > bb_18 (preds = {bb_17 }, succs = {bb_25 }) > bb_19 (preds = {bb_17 }, succs = {bb_25 bb_32 }) > bb_20 (preds = {bb_36 }, succs = {bb_21 bb_22 }) > bb_21 (preds = {bb_20 }, succs = {bb_25 }) > bb_22 (preds = {bb_20 }, succs = {bb_25 }) > bb_23 (preds = {bb_36 }, succs = {bb_33 bb_34 }) > bb_24 (preds = {bb_36 }, succs = {bb_25 bb_35 }) > bb_25 (preds = {bb_10 bb_12 bb_16 bb_19 bb_22 bb_34 bb_35 bb_36 bb_29 > bb_13 bb_15 bb_31 bb_18 bb_32 bb_21 bb_33 bb_24 }, succs = {bb_26 bb_27 }) > bb_26 (preds = {bb_25 }, succs = {bb_4 bb_27 }) > bb_28 (preds = {bb_7 }, succs = {bb_10 bb_8 }) > bb_29 (preds = {bb_11 }, succs = {bb_25 }) > bb_30 (preds = {bb_11 }, succs = {bb_12 bb_13 }) > bb_31 (preds = {bb_16 }, succs = {bb_25 }) > bb_32 (preds = {bb_19 }, succs = {bb_25 }) > bb_33 (preds = {bb_23 }, succs = {bb_25 }) > bb_34 (preds = {bb_23 }, succs = {bb_25 }) > bb_35 (preds = {bb_24 }, succs = {bb_25 }) > } > } > > What about removing the use of e->aux in threadupdate.c, to be able to jump > thread across all the recorded paths? > > Thanks, > Sebastian > From bac0f2a390048652910f77503b21b3e208daeae1 Mon Sep 17 00:00:00 2001 > From: Sebastian Pop <s....@samsung.com> > Date: Fri, 26 Sep 2014 14:54:20 -0500 > Subject: [PATCH] jump thread for PR 54742 > > Adapted from a patch from James Greenhalgh. > > * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the > original value of cond when simplification fails. > (find_thread_path): New. > (find_control_statement_thread_paths): New. > (thread_through_normal_block): Call find_control_statement_thread_paths. > > * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. > --- > gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 32 ++++ > gcc/tree-ssa-threadedge.c | 180 > ++++++++++++++++++++++- > gcc/tree-ssa-threadupdate.c | 4 + > 3 files changed, 215 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c > new file mode 100644 > index 0000000..f3ef725 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c > @@ -0,0 +1,32 @@ > +int sum0, sum1, sum2, sum3; > +int foo(char * s, char** ret) > +{ > + int state=0; > + char c; > + > + for (; *s && state != 4; s++) > + { > + c = *s; > + if (c == '*') > + { > + s++; > + break; > + } > + switch (state) { > + case 0: > + if (c == '+') state = 1; > + else if (c != '-') sum0+=c; > + break; > + case 1: > + if (c == '+') state = 2; > + else if (c == '-') state = 0; > + else sum1+=c; > + break; > + default: > + break; > + } > + > + } > + *ret = s; > + return state; > +} > diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c > index 3dee5ba..7b9e5b6 100644 > --- a/gcc/tree-ssa-threadedge.c > +++ b/gcc/tree-ssa-threadedge.c > @@ -628,6 +628,7 @@ simplify_control_stmt_condition (edge e, > rather than use a relational operator. These are simpler to handle. */ > if (TREE_CODE (cond) == SSA_NAME) > { > + tree original_lhs = cond; > cached_lhs = cond; > > /* Get the variable's current value from the equivalence chains. > @@ -656,6 +657,12 @@ simplify_control_stmt_condition (edge e, > pass specific callback to try and simplify it further. */ > if (cached_lhs && ! is_gimple_min_invariant (cached_lhs)) > cached_lhs = (*simplify) (stmt, stmt); > + > + /* We couldn't find an invariant. But, callers of this > + function may be able to do something useful with the > + unmodified destination. */ > + if (!cached_lhs) > + cached_lhs = original_lhs; > } > else > cached_lhs = NULL; > @@ -915,6 +922,155 @@ thread_around_empty_blocks (edge taken_edge, > return false; > } > > +/* Return true if there is at least one path from START_BB to END_BB. > + VISITED_BBS is used to make sure we don't fall into an infinite loop. */ > + > +static bool > +find_thread_path (basic_block start_bb, basic_block end_bb, > + vec<basic_block, va_gc> *&path, > + hash_set<basic_block> *visited_bbs) > +{ > + if (start_bb == end_bb) > + { > + vec_safe_push (path, start_bb); > + return true; > + } > + > + if (!visited_bbs->add(start_bb)) > + { > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, start_bb->succs) > + if (find_thread_path (e->dest, end_bb, path, visited_bbs)) > + { > + vec_safe_push (path, start_bb); > + return true; > + } > + } > + > + return false; > +} > + > +/* We trace the value of the variable EXPR back through any phi nodes looking > + for places where it gets a constant value and save the path. */ > + > +static void > +find_control_statement_thread_paths (tree expr, > + hash_set<gimple> *visited_phis, > + vec<basic_block, va_gc> *&path) > +{ > + tree var = SSA_NAME_VAR (expr); > + gimple def_stmt = SSA_NAME_DEF_STMT (expr); > + basic_block var_bb = gimple_bb (def_stmt); > + > + if (var == NULL || var_bb == NULL) > + return; > + > + vec<basic_block, va_gc> *next_path; > + vec_alloc (next_path, n_basic_blocks_for_fn (cfun)); > + > + basic_block last_bb_in_path = path->last (); > + > + /* Put the path from var_bb to last_bb_in_path into next_path. */ > + if (var_bb != last_bb_in_path) > + { > + edge e; > + int e_count = 0; > + edge_iterator ei; > + > + FOR_EACH_EDGE (e, ei, last_bb_in_path->preds) > + { > + hash_set<basic_block> *visited_bbs = new hash_set<basic_block>; > + > + if (find_thread_path (var_bb, e->src, next_path, visited_bbs)) > + e_count = e_count + 1; > + > + delete visited_bbs; > + > + /* If there is more than one path, stop. */ > + if (e_count > 1) > + { > + vec_free (next_path); > + return; > + } > + } > + } > + > + /* Visit PHI nodes once. */ > + if (gimple_code (def_stmt) != GIMPLE_PHI > + || visited_phis->add(def_stmt)) { > + vec_free (next_path); > + return; > + } > + > + /* Append all the nodes from next_path to path. */ > + vec_safe_splice (path, next_path); > + gcc_assert (path->last () == var_bb); > + > + /* Iterate over the arguments of PHI. */ > + unsigned int i; > + for (i = 0; i < gimple_phi_num_args (def_stmt); i++) > + { > + tree arg = gimple_phi_arg_def (def_stmt, i); > + basic_block bbi = gimple_phi_arg_edge (def_stmt, i)->src; > + > + /* Skip edges pointing outside the current loop. */ > + if (!arg || var_bb->loop_father != bbi->loop_father) > + continue; > + > + /* Add BBI to the path. */ > + vec_safe_push (path, bbi); > + > + if (TREE_CODE (arg) == INTEGER_CST) > + { > + int j, n = path->length (); > + vec<jump_thread_edge *> *jump_thread_path > + = new vec<jump_thread_edge *> (); > + int joiners = 0; > + > + for (j = 0; j < n - 1; j++) > + { > + edge e = find_edge ((*path)[n - j - 1], > + (*path)[n - j - 2]); > + gcc_assert (e); > + enum jump_thread_edge_type kind; > + > + if (j == 0) > + kind = EDGE_START_JUMP_THREAD; > + else if (single_pred_p (e->src)) > + kind = EDGE_NO_COPY_SRC_BLOCK; > + else { > + kind = EDGE_COPY_SRC_JOINER_BLOCK; > + ++joiners; > + } > + > + jump_thread_edge *x = new jump_thread_edge (e, kind); > + jump_thread_path->safe_push (x); > + } > + > + /* Add the edge taken when the control variable has value ARG. */ > + edge taken_edge = find_taken_edge ((*path)[0], arg); > + jump_thread_edge *x > + = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK); > + jump_thread_path->safe_push (x); > + > + /* Thread-update does not handle more than two joiners. A path with > + less than 3 nodes should not be jump-threaded. */ > + if (joiners < 2 && n > 2) > + register_jump_thread (jump_thread_path); > + } > + else if (TREE_CODE (arg) == SSA_NAME) > + find_control_statement_thread_paths (arg, visited_phis, path); > + > + /* Remove BBI from the path. */ > + path->pop (); > + } > + > + /* Remove all the nodes that we added from next_path. */ > + vec_safe_truncate (path, (path->length () - next_path->length ())); > + vec_free (next_path); > +} > + > /* We are exiting E->src, see if E->dest ends with a conditional > jump which has a known value when reached via E. > > @@ -1000,7 +1156,10 @@ thread_through_normal_block (edge e, > cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify, > handle_dominating_asserts); > > - if (cond && is_gimple_min_invariant (cond)) > + if (!cond) > + return 0; > + > + if (is_gimple_min_invariant (cond)) > { > edge taken_edge = find_taken_edge (e->dest, cond); > basic_block dest = (taken_edge ? taken_edge->dest : NULL); > @@ -1046,7 +1205,25 @@ thread_through_normal_block (edge e, > backedge_seen_p); > return 1; > } > + > + if (TREE_CODE (cond) != SSA_NAME > + || e->dest->loop_father != e->src->loop_father) > + return 0; > + > + /* When COND cannot be simplified, try to find paths from a control > + statement back through the PHI nodes which would affect that control > + statement. */ > + vec<basic_block, va_gc> *bb_path; > + vec_alloc (bb_path, n_basic_blocks_for_fn (cfun)); > + vec_safe_push (bb_path, e->dest); > + hash_set<gimple> *visited_phis = new hash_set<gimple>; > + > + find_control_statement_thread_paths (cond, visited_phis, bb_path); > + > + delete visited_phis; > + vec_free (bb_path); > + > + return -1; > } > return 0; > } > -- > 2.1.0.243.g30d45f7 >
>From 23b1ac8fa92e9e4cd05edb2967aba564126f75a1 Mon Sep 17 00:00:00 2001 From: Sebastian Pop <s....@samsung.com> Date: Fri, 26 Sep 2014 14:54:20 -0500 Subject: [PATCH] jump thread for PR 54742 Adapted from a patch from James Greenhalgh. * params.def (max-thread-path-insns, max-thread-length, max-thread-paths): New. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. * tree-cfg.c (gimple_duplicate_sese_region): Save and restore loop header and latch. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the original value of cond when simplification fails. (fsm_find_thread_path): New. (fsm_find_control_statement_thread_paths): New. (fsm_thread_through_normal_block): Call find_control_statement_thread_paths. * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print EDGE_START_FSM_THREAD. (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges calling gimple_duplicate_sese_region. * tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD. --- gcc/params.def | 19 +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 32 ++++ gcc/tree-cfg.c | 21 ++- gcc/tree-ssa-threadedge.c | 200 ++++++++++++++++++++++- gcc/tree-ssa-threadupdate.c | 52 +++++- gcc/tree-ssa-threadupdate.h | 1 + 6 files changed, 320 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c diff --git a/gcc/params.def b/gcc/params.def index d2d2add..749f962 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -123,6 +123,25 @@ DEFPARAM (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY, "Maximum probability of the entry BB of split region (in percent relative to entry BB of the function) to make partial inlining happen", 70, 0, 0) +/* Maximum number of instructions to copy when duplicating blocks + on a jump thread path. */ +DEFPARAM (PARAM_MAX_THREAD_PATH_INSNS, + "max-thread-path-insns", + "Maximum number of instructions to copy when duplicating blocks on a jump thread path", + 100, 1, 999999) + +/* Maximum length of a jump thread path. */ +DEFPARAM (PARAM_MAX_THREAD_LENGTH, + "max-thread-length", + "Maximum number of basic blocks on a jump thread path", + 10, 1, 999999) + +/* Maximum number of jump thread paths to duplicate. */ +DEFPARAM (PARAM_MAX_THREAD_PATHS, + "max-thread-paths", + "Maximum number of new jump thread paths to create", + 50, 1, 999999) + /* Limit the number of expansions created by the variable expansion optimization to avoid register pressure. */ DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS, diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c new file mode 100644 index 0000000..f3ef725 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -0,0 +1,32 @@ +int sum0, sum1, sum2, sum3; +int foo(char * s, char** ret) +{ + int state=0; + char c; + + for (; *s && state != 4; s++) + { + c = *s; + if (c == '*') + { + s++; + break; + } + switch (state) { + case 0: + if (c == '+') state = 1; + else if (c != '-') sum0+=c; + break; + case 1: + if (c == '+') state = 2; + else if (c == '-') state = 0; + else sum1+=c; + break; + default: + break; + } + + } + *ret = s; + return state; +} diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index ee10bc6..565cfe3 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -5949,10 +5949,12 @@ gimple_duplicate_sese_region (edge entry, edge exit, { unsigned i; bool free_region_copy = false, copying_header = false; + bool save_loop_details = false; struct loop *loop = entry->dest->loop_father; edge exit_copy; vec<basic_block> doms; edge redirected; + int memo_loop_header_no = 0, memo_loop_latch_no = 0; int total_freq = 0, entry_freq = 0; gcov_type total_count = 0, entry_count = 0; @@ -5970,9 +5972,15 @@ gimple_duplicate_sese_region (edge entry, edge exit, if (region[i]->loop_father != loop) return false; - if (region[i] != entry->dest - && region[i] == loop->header) - return false; + /* If we are copying an abnormally shaped region, keep track of + which block will become our loop header. */ + if ((region[i] != entry->dest && region[i] == loop->header) + || (region[i] != entry->src && region[i] == loop->latch)) + { + save_loop_details = true; + memo_loop_latch_no = i; + memo_loop_header_no = i; + } } /* In case the function is used for loop header copying (which is the primary @@ -6055,6 +6063,13 @@ gimple_duplicate_sese_region (edge entry, edge exit, loop->latch = exit->src; } + /* Restore loop details if we were asked to save them. */ + if (save_loop_details) + { + loop->header = region_copy[memo_loop_header_no]; + loop->latch = region_copy[memo_loop_latch_no]; + } + /* Redirect the entry and add the phi node arguments. */ redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest)); gcc_assert (redirected != NULL); diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index d5b9696..de9b3fe 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "tree-ssa-threadedge.h" #include "builtins.h" +#include "cfg.h" +#include "cfganal.h" /* To avoid code explosion due to jump threading, we limit the number of statements we are going to copy. This variable @@ -660,6 +662,7 @@ simplify_control_stmt_condition (edge e, rather than use a relational operator. These are simpler to handle. */ if (TREE_CODE (cond) == SSA_NAME) { + tree original_lhs = cond; cached_lhs = cond; /* Get the variable's current value from the equivalence chains. @@ -688,6 +691,12 @@ simplify_control_stmt_condition (edge e, pass specific callback to try and simplify it further. */ if (cached_lhs && ! is_gimple_min_invariant (cached_lhs)) cached_lhs = (*simplify) (stmt, stmt); + + /* We couldn't find an invariant. But, callers of this + function may be able to do something useful with the + unmodified destination. */ + if (!cached_lhs) + cached_lhs = original_lhs; } else cached_lhs = NULL; @@ -947,6 +956,172 @@ thread_around_empty_blocks (edge taken_edge, return false; } +/* Return true if there is at least one path from START_BB to END_BB. + VISITED_BBS is used to make sure we don't fall into an infinite loop. */ + +static bool +fsm_find_thread_path (basic_block start_bb, basic_block end_bb, + vec<basic_block, va_gc> *&path, + hash_set<basic_block> *visited_bbs, int n_insns) +{ + if (start_bb == end_bb) + { + vec_safe_push (path, start_bb); + return true; + } + + if (!visited_bbs->add(start_bb)) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, start_bb->succs) + if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns)) + { + vec_safe_push (path, start_bb); + return true; + } + } + + return false; +} + +static int max_threaded_paths; + +/* We trace the value of the variable EXPR back through any phi nodes looking + for places where it gets a constant value and save the path. Stop after + having recorded MAX_PATHS jump threading paths. */ + +static void +fsm_find_control_statement_thread_paths (tree expr, + hash_set<gimple> *visited_phis, + vec<basic_block, va_gc> *&path) +{ + tree var = SSA_NAME_VAR (expr); + gimple def_stmt = SSA_NAME_DEF_STMT (expr); + basic_block var_bb = gimple_bb (def_stmt); + + if (var == NULL || var_bb == NULL) + return; + + vec<basic_block, va_gc> *next_path; + vec_alloc (next_path, n_basic_blocks_for_fn (cfun)); + + basic_block last_bb_in_path = path->last (); + + /* Put the path from var_bb to last_bb_in_path into next_path. */ + if (var_bb != last_bb_in_path) + { + edge e; + int e_count = 0; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, last_bb_in_path->preds) + { + hash_set<basic_block> *visited_bbs = new hash_set<basic_block>; + + if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs, 0)) + ++e_count; + + delete visited_bbs; + + /* If there is more than one path, stop. */ + if (e_count > 1) + { + vec_free (next_path); + return; + } + } + } + + /* Visit PHI nodes once. */ + if (gimple_code (def_stmt) != GIMPLE_PHI + || visited_phis->add(def_stmt)) + { + vec_free (next_path); + return; + } + + /* Append all the nodes from next_path to path. */ + vec_safe_splice (path, next_path); + gcc_assert (path->last () == var_bb); + + /* Iterate over the arguments of PHI. */ + unsigned int i; + for (i = 0; i < gimple_phi_num_args (def_stmt); i++) + { + tree arg = gimple_phi_arg_def (def_stmt, i); + basic_block bbi = gimple_phi_arg_edge (def_stmt, i)->src; + + /* Skip edges pointing outside the current loop. */ + if (!arg || var_bb->loop_father != bbi->loop_father) + continue; + + /* Add BBI to the path. */ + vec_safe_push (path, bbi); + + if (TREE_CODE (arg) == INTEGER_CST) + { + int j, n = path->length (); + vec<jump_thread_edge *> *jump_thread_path + = new vec<jump_thread_edge *> (); + int joiners = 0; + + for (j = 0; j < n - 1; j++) + { + edge e = find_edge ((*path)[n - j - 1], + (*path)[n - j - 2]); + gcc_assert (e); + enum jump_thread_edge_type kind; + + if (j == 0) + kind = EDGE_START_FSM_THREAD; + else if (single_pred_p (e->src)) + kind = EDGE_NO_COPY_SRC_BLOCK; + else { + kind = EDGE_COPY_SRC_JOINER_BLOCK; + ++joiners; + } + + jump_thread_edge *x = new jump_thread_edge (e, kind); + jump_thread_path->safe_push (x); + } + + /* Add the edge taken when the control variable has value ARG. */ + edge taken_edge = find_taken_edge ((*path)[0], arg); + jump_thread_edge *x + = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK); + jump_thread_path->safe_push (x); + + /* A path with less than 3 nodes should not be jump-threaded. */ + if (n > 2 && n < PARAM_VALUE (PARAM_MAX_THREAD_LENGTH) + && max_threaded_paths > 0) + { + int n_insns = 0; + gimple_stmt_iterator gsi; + + for (j = 1; j < n - 1; j++) + for (gsi = gsi_start_bb ((*path)[j]); !gsi_end_p (gsi); gsi_next (&gsi)) + ++n_insns; + + if (n_insns < PARAM_VALUE (PARAM_MAX_THREAD_PATH_INSNS)) + { + register_jump_thread (jump_thread_path); + --max_threaded_paths; + } + } + } + else if (TREE_CODE (arg) == SSA_NAME) + fsm_find_control_statement_thread_paths (arg, visited_phis, path); + + /* Remove BBI from the path. */ + path->pop (); + } + + /* Remove all the nodes that we added from next_path. */ + vec_safe_truncate (path, (path->length () - next_path->length ())); + vec_free (next_path); +} + /* We are exiting E->src, see if E->dest ends with a conditional jump which has a known value when reached via E. @@ -1032,7 +1207,10 @@ thread_through_normal_block (edge e, cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify, handle_dominating_asserts); - if (cond && is_gimple_min_invariant (cond)) + if (!cond) + return 0; + + if (is_gimple_min_invariant (cond)) { edge taken_edge = find_taken_edge (e->dest, cond); basic_block dest = (taken_edge ? taken_edge->dest : NULL); @@ -1078,6 +1256,26 @@ thread_through_normal_block (edge e, backedge_seen_p); return 1; } + + if (TREE_CODE (cond) != SSA_NAME + || e->dest->loop_father != e->src->loop_father) + return 0; + + /* When COND cannot be simplified, try to find paths from a control + statement back through the PHI nodes which would affect that control + statement. */ + vec<basic_block, va_gc> *bb_path; + vec_alloc (bb_path, n_basic_blocks_for_fn (cfun)); + vec_safe_push (bb_path, e->dest); + hash_set<gimple> *visited_phis = new hash_set<gimple>; + + max_threaded_paths = PARAM_VALUE (PARAM_MAX_THREAD_PATHS); + fsm_find_control_statement_thread_paths (cond, visited_phis, bb_path); + + delete visited_phis; + vec_free (bb_path); + + return -1; } return 0; } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 151ed83..5847078 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -167,8 +167,9 @@ dump_jump_thread_path (FILE *dump_file, vec<jump_thread_edge *> path, bool registering) { fprintf (dump_file, - " %s jump thread: (%d, %d) incoming edge; ", + " %s%s jump thread: (%d, %d) incoming edge; ", (registering ? "Registering" : "Cancelling"), + (path[0]->type == EDGE_START_FSM_THREAD ? " FSM": ""), path[0]->e->src->index, path[0]->e->dest->index); for (unsigned int i = 1; i < path.length (); i++) @@ -2343,6 +2344,55 @@ thread_through_all_blocks (bool may_peel_loop_headers) threaded_blocks = BITMAP_ALLOC (NULL); memset (&thread_stats, 0, sizeof (thread_stats)); + for (i = 0; i < paths.length (); ) + { + vec<jump_thread_edge *> *path = paths[i]; + edge entry = (*path)[0]->e; + + if ((*path)[0]->type != EDGE_START_FSM_THREAD + /* Do not jump-thread twice from the same block. */ + || bitmap_bit_p (threaded_blocks, entry->src->index)) { + i++; + continue; + } + + unsigned len = path->length (); + edge exit = (*path)[len - 1]->e; + basic_block *region = XNEWVEC (basic_block, len - 1); + + for (unsigned int j = 0; j < len - 1; j++) + region[j] = (*path)[j]->e->dest; + + bool success = gimple_duplicate_sese_region (entry, exit, region, + len - 1, NULL, 0); + delete_jump_thread_path (path); + paths.unordered_remove (i); + + if (success) { + /* We do not update dominance info. */ + free_dominance_info (CDI_DOMINATORS); + + bitmap_set_bit (threaded_blocks, entry->src->index); + } + } + + for (i = 0; i < paths.length (); ) + { + vec<jump_thread_edge *> *path = paths[i]; + edge entry = (*path)[0]->e; + + /* Do not jump-thread twice from the same block. */ + if (bitmap_bit_p (threaded_blocks, entry->src->index)) + { + delete_jump_thread_path (path); + paths.unordered_remove (i); + } + else + i++; + } + + bitmap_clear (threaded_blocks); + mark_threaded_blocks (threaded_blocks); initialize_original_copy_tables (); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 426aca5..42c3a9e 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -26,6 +26,7 @@ extern bool thread_through_all_blocks (bool); enum jump_thread_edge_type { EDGE_START_JUMP_THREAD, + EDGE_START_FSM_THREAD, EDGE_COPY_SRC_BLOCK, EDGE_COPY_SRC_JOINER_BLOCK, EDGE_NO_COPY_SRC_BLOCK -- 2.1.0.243.g30d45f7