15 Regression] risc-v (last) insn-emit-nn.c build takes hours

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 06 Aug 2024 05:50:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166


--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
So interestingly this is a case where we run into the irange::maybe_resize
re-allocation a lot (48 Million times), so a callgrind profile has
that and the corresponding DTOR at top in the 'Self' reporting order.

>From the path_range_query::compute_ranges -> compute_ranges_in_block counts
we can infer the average path length is 178(!) and mostly composed of
"empty" blocks (but we've guessed that already).  That's all from just 550
thread_across_edge calls where we are able to find a normal thread and
then a very deep empty tail where we can simplify all conditions.

That's where the quadraticness comes in - as we enlarge the path for each
condition we simplify (and add a block) we re-compute ranges for all blocks
collected sofar.  I've noticed with the backwards threader that path
ranger isn't very good in the ability to preserve a cache when adding
blocks.  But in the case of forward threading resetting the cache shouldn't
be necessary - we might get additional "interesting" names to analyze but
already analyzed names do not need re-analyzing (as opposed to the backward
threader where we add blocks at the start of the path and thus would
possibly get refined ranges).  We only need to clear the cache when
popping blocks - and even then only for names defined in the popped block.

A more pragmatic fix would be to limit the number of EDGE_NO_COPY_SRC_BLOCK
blocks we add to the path aka the depth of the thread_around_empty_blocks
recursion.  Limiting that to 10 for example makes the testcase compile in 1s.
Implementing search space limiting and re-using --param max-jump-thread-paths
works as well, resulting in 3s.

[Bug tree-optimization/116166] [13/14/15 Regression] risc-v (last) insn-emit-nn.c build takes hours

Reply via email to