https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Mark Wielaard from comment #12) > (In reply to Andreas Schwab from comment #11) > > You can add target-specific flags like this: > > > > $(INSNEMIT_SEQ_O): ALL_COMPILERFLAGS += -fno-tree-dominator-opts > > Thanks. With "$(GIMPLE_MATCH_PD_SEQ_O) $(INSNEMIT_SEQ_O) insn-opinit.o > insn-recog.o: ALL_COMPILERFLAGS += -O1 -fno-tree-dominator-opts" a make -j64 > drops from 8 hours to 3.5 hours: > > real 202m25.031s > user 2209m7.176s > sys 108m49.102s > > Now insn-recog.cc (even though it is included in the workaround) takes the > longest time (~1 hour) to compile. Compiling insn-recog.cc for a cross-compiler to riscv on x86_64 with trunk and -O2 takes 90s with a quite flat profile. Are those worst timings using a stage1 compiler built with default flags (-O0)? Seeing the profile in the description I'll note the backwards threader has a search depth for jump thread paths (--param max-jump-thread-paths) but thread_around_empty_blocks search space is unlimited - with EDGE_NO_COPY_SRC_BLOCK we do not account any stmts towards the stmt limit. We're also doing a lot of redundant stmt simplifications by likely quadratically exploring jump threading paths. And each hybrid_jt_simplifier::simplify call resets the path query path which we know is a very expensive operation, it also shares the issues the backwards threader originally had, starting with too big imports. Doing that up to 2^four times for each block is wasteful - simplify_control_stmt_condition_1 ends up calling hybrid_jt_simplifier::simplify through dom_jt_simplifier::simplify and while simplify_control_stmt_condition_1 has a recursion limit while processing & and | it recurses to both arms, something ranger can do itself(?). The threader JT simplifier is over-abstracted - only DOM seems to use hybrid_jt_simplifier. The following should cut compile-time down significantly (I'm not sure if the "old" DOM equiv lookup done by dom-simplify is even necessary). IMO "gimping" the old forward threader with ranger was misguided as it was supposed to vanish anyway. diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc index 7f82639b8ec..cac290175d4 100644 --- a/gcc/tree-ssa-threadedge.cc +++ b/gcc/tree-ssa-threadedge.cc @@ -634,7 +634,8 @@ jump_threader::simplify_control_stmt_condition_1 then use the pass specific callback to simplify the condition. */ if (!res || !is_gimple_min_invariant (res)) - res = m_simplifier->simplify (dummy_cond, stmt, e->src, m_state); + res = m_simplifier->simplify (dummy_cond, stmt, e->src, + limit == 4 ? m_state : NULL); return res; } Note it doesn't help we're trying normal/empty thread stuff over and over. Possibly RISC-V has "bad" LOGICAL_OP_NON_SHORT_CIRCUIT, it defines it to zero which means all && and || conditions are CFG branches initially. Can someone try adding --param logical-op-non-short-circuit=1 to that FLAGS workaround?