https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855
--- Comment #50 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #4) > Trunk at -O1: > > dominator optimization : 495.14 ( 82%) 0.20 ( 5%) 495.44 ( > 81%) 113M ( 5%) Compared to that we're now at the following state with -O1 (everything >= 4%): callgraph ipa passes : 17.23 ( 10%) df live regs : 6.76 ( 4%) dominator optimization : 89.76 ( 50%) backwards jump threading : 7.94 ( 4%) TOTAL : 180.77 So it's still DOM aka forward threading eating most of the time. -fno-thread-jumps improves compile-time to 77s, DOM then still takes 25s (33%) (top offenders are then dom_oracle::register_transitives, bitmap_set_bit and wide_int_storage copying). I noticed the unbound dominator traversal in register_transitives already. With -O2 we're still running into the backwards threader slowness. I don't see a quick way to fix that without also eventually changing what is threaded and what is not as side-effect of changing thread materialization order. So I think a bigger refactoring like Aldy started is necessary. Eventually I'll re-investigate a "quick" fix, but at least being able to record additional meta per thread path is necessary (so 0001 of Aldys proposed series in it's current or in slightly altered form).