https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855
--- Comment #39 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- I'm going to step away from this PR for a few weeks so I'll do a brain dump on where I am, just in case someone wants to poke at it some more. This problem in adjust_paths_after_duplication() boils down to a slow sub-path search. The current search is quadratic, but it can easily be remedied by keeping the paths sorted. In a patchset I will attach shortly we can dramatically improve the performance by implementing the above: ** mainline Time variable usr sys wall GGC dominator optimization : 161.07 ( 9%) 0.65 ( 8%) 162.00 ( 9%) 236M ( 9%) backwards jump threading :1386.84 ( 78%) 1.76 ( 22%)1393.97 ( 78%) 519M ( 20%) TOTAL :1775.41 7.88 1789.76 2636M ** with patchset torsion:~/bld/benchmark/tainted/gcc []$ Time variable usr sys wall GGC dominator optimization : 284.32 ( 46%) 1.39 ( 16%) 286.54 ( 46%) 447M ( 13%) backwards jump threading : 15.51 ( 3%) 0.87 ( 10%) 16.40 ( 3%) 510M ( 15%) TOTAL : 616.15 8.81 626.82 3334M The problem though is that the jump threader magic that reorders the basic blocks once we know what paths to thread, is sensitive to the order of the vector itself. This affects all threaders, and it's code I'm unfamiliar with. For example, for gcc.dg/tree-ssa/ssa-sink-13.c has the following thread paths in the queue: [1] Registering jump thread: (2, 15) incoming edge; (4, 5) nocopy; [2] Registering jump thread: (2, 3) incoming edge; (3, 4) normal (4, 6) nocopy; [3] Registering jump thread: (4, 6) incoming edge; (6, 8) nocopy; [4] Registering jump thread: (6, 8) incoming edge; (8, 9) nocopy; [5] Registering jump thread: (6, 7) incoming edge; (7, 8) normal (8, 10) nocopy; [6] Registering jump thread: (8, 10) incoming edge; (10, 12) nocopy; [7] Registering jump thread: (10, 12) incoming edge; (12, 13) nocopy; [8] Registering jump thread: (10, 11) incoming edge; (11, 12) normal (12, 14) nocopy; Once we process the first path, the code in back_jt_path_registry::update_cfg() removes it with m_paths.ordered_remove (0). The vector code cheaply moves the path in position 8, to the first position leaving the vector like this: [1] Registering jump thread: (10, 11) incoming edge; (11, 12) normal (12, 14) nocopy; [2] Registering jump thread: (2, 3) incoming edge; (3, 4) normal (4, 6) nocopy; [3] Registering jump thread: (4, 6) incoming edge; (6, 8) nocopy; [4] Registering jump thread: (6, 8) incoming edge; (8, 9) nocopy; [5] Registering jump thread: (6, 7) incoming edge; (7, 8) normal (8, 10) nocopy; [6] Registering jump thread: (8, 10) incoming edge; (10, 12) nocopy; [7] Registering jump thread: (10, 12) incoming edge; (12, 13) nocopy; This simple change starts a cascade of transformations that ultimately causes us to spectacularly fail ssa-sink-13.c. So I have a patchset that could theoretically fix the performance issues in the backward threader, but we first need to figure out why/if we care about the ordering of the path vector affecting code generation. Cause I'm afraid any change we make is going to be affected by this. I'll post patches next.