On 12/11/2017 10:17 PM, Alexandre Oliva wrote: > On Dec 11, 2017, Jeff Law <l...@redhat.com> wrote: > > I've updated it according to richi's and your feedbacks. Regstrapped on > {x86_64,i686}-linux-gnu. Ok to install? > > > We limit the amount of copying for jump threading based on counting > stmts. This counting is overly pessimistic, because we will very > often delete stmts as a consequence of jump threading: when the final > conditional jump of a block is removed, earlier SSA names computed > exclusively for use in that conditional are killed. Furthermore, PHI > nodes in blocks with only two predecessors are trivially replaced with > their now-single values after threading. > > This patch scans blocks to be copied in the path constructed so far > and estimates the number of stmts that will be removed in the copies, > bumping up the stmt count limit. > > for gcc/ChangeLog > > PR tree-optimization/81165 > * tree-ssa-threadedge.c (uses_in_bb): New. > (estimate_threading_killed_stmts): New. > (estimate_threading_killed_stmts): New overload. > (record_temporary_equivalences_from_stmts_at_dest): Add path > parameter; adjust caller. Expand limit when it's hit. > > for gcc/testsuite/ChangeLog > > PR tree-optimization/81165 > * gcc.dg/pr81165.c: New.
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c > index 91793bfa59d3..0f5b943aa9a0 100644 > --- a/gcc/tree-ssa-threadedge.c > +++ b/gcc/tree-ssa-threadedge.c > @@ -170,6 +170,160 @@ threadedge_valueize (tree t) > return t; > } > > +/* Return how many uses of T there are within BB, as long as there > + aren't any uses outside BB. If there are any uses outside BB, > + return -1 if there's at most one use within BB, or -2 if there is > + more than one use within BB. */ > + > +static int > +uses_in_bb (tree t, basic_block bb) > +{ > + int uses = 0; > + bool outside_bb = false; > + > + imm_use_iterator iter; > + use_operand_p use_p; > + FOR_EACH_IMM_USE_FAST (use_p, iter, t) > + { > + if (is_gimple_debug (USE_STMT (use_p))) > + continue; > + > + if (gimple_bb (USE_STMT (use_p)) != bb) > + outside_bb = true; > + else > + uses++; > + > + if (outside_bb && uses > 1) > + return -2; > + } > + > + if (outside_bb) > + return -1; > + > + return uses; > +} > + > +/* Starting from the final control flow stmt in BB, assuming it will > + be removed, follow uses in to-be-removed stmts back to their defs > + and count how many defs are to become dead and be removed as > + well. */ > + > +static int > +estimate_threading_killed_stmts (basic_block bb) > +{ > + int killed_stmts = 0; > + hash_map<tree, int> ssa_remaining_uses; > + auto_vec<gimple *, 4> dead_worklist; > + > + /* If the block has only two predecessors, threading will turn phi > + dsts into either src, so count them as dead stmts. */ > + bool drop_all_phis = EDGE_COUNT (bb->preds) == 2; > + > + if (drop_all_phis) > + for (gphi_iterator gsi = gsi_start_phis (bb); > + !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gphi *phi = gsi.phi (); > + tree dst = gimple_phi_result (phi); > + > + /* We don't count virtual PHIs as stmts in > + record_temporary_equivalences_from_phis. */ > + if (virtual_operand_p (dst)) > + continue; > + > + killed_stmts++; > + } > + > + if (gsi_end_p (gsi_last_bb (bb))) > + return killed_stmts; > + > + gimple *stmt = gsi_stmt (gsi_last_bb (bb)); > + if (gimple_code (stmt) != GIMPLE_COND > + && gimple_code (stmt) != GIMPLE_GOTO > + && gimple_code (stmt) != GIMPLE_SWITCH) > + return killed_stmts; > + > + dead_worklist.quick_push (stmt); > + while (!dead_worklist.is_empty ()) > + { > + stmt = dead_worklist.pop (); > + > + ssa_op_iter iter; > + use_operand_p use_p; > + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) > + { > + tree t = USE_FROM_PTR (use_p); > + gimple *def = SSA_NAME_DEF_STMT (t); > + > + if (gimple_bb (def) == bb > + && (gimple_code (def) != GIMPLE_PHI > + || !drop_all_phis) > + && !gimple_has_side_effects (def)) > + { > + int *usesp = ssa_remaining_uses.get (t); > + int uses; > + > + if (usesp) > + uses = *usesp; > + else > + uses = uses_in_bb (t, bb); > + > + gcc_assert (uses); > + > + /* Don't bother recording the expected use count if we > + won't find any further uses within BB. */ > + if (!usesp && (uses < -1 || uses > 1)) > + { > + usesp = &ssa_remaining_uses.get_or_insert (t); > + *usesp = uses; > + } > + > + if (uses < 0) > + continue; > + > + --uses; > + if (usesp) > + *usesp = uses; > + > + if (!uses) > + { > + killed_stmts++; > + if (usesp) > + ssa_remaining_uses.remove (t); > + if (gimple_code (def) != GIMPLE_PHI) > + dead_worklist.safe_push (def); > + } > + } > + } > + } > + > + if (dump_file) > + fprintf (dump_file, "threading bb %i kills %i stmts\n", > + bb->index, killed_stmts); > + > + return killed_stmts; So it appears the loop counts a statement as killed when its uses are no longer needed. AFAICT we fail to count the conditional as killed. That's trivially fixed. I also think we want to use this elsewhere. I think putting it in tree-ssa-threadupdate.c and exporting from tree-ssa-threadupdate.h should work. > +} > + > +/* Estimate, over all src blocks to be copied in PATH and E's dest, > + the number of stmts that become dead as a consequence of removing > + their final control flow stmts. If PATH is not empty, E must be > + its last entry. */ > + > +static int > +estimate_threading_killed_stmts (edge e, vec<jump_thread_edge *> *path) > +{ > + gcc_assert (path->length () == 0 || path->last ()->e == e); > + int total = estimate_threading_killed_stmts (e->dest); > + > + int i; > + jump_thread_edge *je; > + FOR_EACH_VEC_ELT_REVERSE (*path, i, je) > + if (je->type == EDGE_COPY_SRC_BLOCK) > + total += estimate_threading_killed_stmts (je->e->src); > + > + return total; > +} I don't think we need this function at all. At the point where you're calling it, we're at the one and only block within the jump threading path where we're copying a block where statements are going to be killed. So I think you should just remove this function and call the other one (that works on just a basic block). THat also means we don't have to pass the path around. I've got those minor tweaks in a tree here along with the changes to use it to fix 36550 as well. Let me run them through some deeper testing. jeff