https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> --- So interestingly this is a case where we run into the irange::maybe_resize re-allocation a lot (48 Million times), so a callgrind profile has that and the corresponding DTOR at top in the 'Self' reporting order. >From the path_range_query::compute_ranges -> compute_ranges_in_block counts we can infer the average path length is 178(!) and mostly composed of "empty" blocks (but we've guessed that already). That's all from just 550 thread_across_edge calls where we are able to find a normal thread and then a very deep empty tail where we can simplify all conditions. That's where the quadraticness comes in - as we enlarge the path for each condition we simplify (and add a block) we re-compute ranges for all blocks collected sofar. I've noticed with the backwards threader that path ranger isn't very good in the ability to preserve a cache when adding blocks. But in the case of forward threading resetting the cache shouldn't be necessary - we might get additional "interesting" names to analyze but already analyzed names do not need re-analyzing (as opposed to the backward threader where we add blocks at the start of the path and thus would possibly get refined ranges). We only need to clear the cache when popping blocks - and even then only for names defined in the popped block. A more pragmatic fix would be to limit the number of EDGE_NO_COPY_SRC_BLOCK blocks we add to the path aka the depth of the thread_around_empty_blocks recursion. Limiting that to 10 for example makes the testcase compile in 1s. Implementing search space limiting and re-using --param max-jump-thread-paths works as well, resulting in 3s.