[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 06 Aug 2024 02:10:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166


--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Mark Wielaard from comment #12)
> (In reply to Andreas Schwab from comment #11)
> > You can add target-specific flags like this:
> > 
> > $(INSNEMIT_SEQ_O): ALL_COMPILERFLAGS += -fno-tree-dominator-opts
> 
> Thanks. With "$(GIMPLE_MATCH_PD_SEQ_O) $(INSNEMIT_SEQ_O) insn-opinit.o
> insn-recog.o: ALL_COMPILERFLAGS += -O1 -fno-tree-dominator-opts" a make -j64
> drops from 8 hours to 3.5 hours:
> 
> real  202m25.031s
> user  2209m7.176s
> sys   108m49.102s
> 
> Now insn-recog.cc (even though it is included in the workaround) takes the
> longest time (~1 hour) to compile.

Compiling insn-recog.cc for a cross-compiler to riscv on x86_64 with trunk
and -O2 takes 90s with a quite flat profile.

Are those worst timings using a stage1 compiler built with default flags (-O0)?

Seeing the profile in the description I'll note the backwards threader has
a search depth for jump thread paths (--param max-jump-thread-paths) but
thread_around_empty_blocks search space is unlimited - with
EDGE_NO_COPY_SRC_BLOCK we do not account any stmts towards the stmt limit.

We're also doing a lot of redundant stmt simplifications by likely
quadratically
exploring jump threading paths.  And each hybrid_jt_simplifier::simplify
call resets the path query path which we know is a very expensive operation,
it also shares the issues the backwards threader originally had, starting
with too big imports.  Doing that up to 2^four times for each block is
wasteful - simplify_control_stmt_condition_1 ends up calling
hybrid_jt_simplifier::simplify through dom_jt_simplifier::simplify and
while simplify_control_stmt_condition_1 has a recursion limit while
processing & and | it recurses to both arms, something ranger can do
itself(?).

The threader JT simplifier is over-abstracted - only DOM seems to use
hybrid_jt_simplifier.  The following should cut compile-time down
significantly (I'm not sure if the "old" DOM equiv lookup done by
dom-simplify is even necessary).  IMO "gimping" the old forward threader
with ranger was misguided as it was supposed to vanish anyway.

diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc
index 7f82639b8ec..cac290175d4 100644
--- a/gcc/tree-ssa-threadedge.cc
+++ b/gcc/tree-ssa-threadedge.cc
@@ -634,7 +634,8 @@ jump_threader::simplify_control_stmt_condition_1
      then use the pass specific callback to simplify the condition.  */
   if (!res
       || !is_gimple_min_invariant (res))
-    res = m_simplifier->simplify (dummy_cond, stmt, e->src, m_state);
+    res = m_simplifier->simplify (dummy_cond, stmt, e->src,
+                                 limit == 4 ? m_state : NULL);

   return res;
 }

Note it doesn't help we're trying normal/empty thread stuff over and over.
Possibly RISC-V has "bad" LOGICAL_OP_NON_SHORT_CIRCUIT, it defines it to zero
which means all && and || conditions are CFG branches initially.

Can someone try adding --param logical-op-non-short-circuit=1 to that
FLAGS workaround?

[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours

Reply via email to