https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514
Bug ID: 106514 Summary: [12/13 Regression] ranger slowness in path query Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When you bump --param max-jump-thread-duplication-stmts only so slightly, like by a factor of two to 30, making the effective backwards threading limit 15, the gcc.dg/pr69592.c -O2 compile-time explodes and you'll see backwards jump threading : 9.17 ( 92%) 0.01 ( 20%) 9.18 ( 92%) 24 ( 0%) note the testcase is somewhat degenerate with a series of diamonds also "nicely" showing the backwards threader exponential behavior in exploring the threading path space (plus ontop the quadraticness with starting on every condition). The current effective limit of 7 copied stmts limits the effective thread length to a single diamond, avoiding the issue. perf shows you Samples: 143K of event 'cycles:u', Event count (approx.): 127516678963 Overhead Samples Command Shared Object Symbol 24.36% 34962 cc1 cc1 [.] bitmap_bit_p 18.78% 26962 cc1 cc1 [.] bitmap_list_find_element 14.98% 21505 cc1 cc1 [.] path_oracle::killing_def 1.94% 2791 cc1 cc1 [.] path_range_query::compute_ranges_in_block so it's not the exponential search space per-se but the high overhead of ranger, specifically the relation oracle which seems to be unbound. path_range_query::range_defined_in_block calling path_oracle::killing_def 87 000 times gets you 600 000 000 bitmap_bit_p queries (resulting in 10 billion(!) bitmap list walk steps). Part of the sub-optimality is probably the equiv chain becoming very long (can we simply limit that?) and clearing bits in all the very many bitmaps linked. Not to say that linked lists (for relations and equivalences) are hardly a good data structure for anything but inserts/removals :/ The bitmap (on the list) + linked list combos should probably be all replaced with splay trees. There's (unused) splay-tree-utils.h that seem to be splay tree "adaptors" ontop of something with links, but libiberty splay-trees of course work as well. I think it's worth optimizing for small number of elements, thus favor a balanced tree over hashing. Note for the testcase at hand it's walking of the m_equiv list, not the m_relations one so it might be a bit more difficult to fix this than if the issue were the m_relations chain. I'm also seeing missed micro-optimization like // Walk the equivalency list and remove SSA from any equivalencies. if (bitmap_bit_p (m_equiv.m_names, v)) ... else bitmap_set_bit (m_equiv.m_names, v); which can be written as if (!bitmap_set_bit (m_equiv.m_names, v)) .... likewise for bitmap_clear_bit. Both return whether the bit changed with the operation. Of course that's just constant factor, the issue here is complexity involving the linear list walks.