13 Regression] ranger slowness in path query

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 03 Aug 2022 06:53:33 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514


            Bug ID: 106514
           Summary: [12/13 Regression] ranger slowness in path query
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When you bump --param max-jump-thread-duplication-stmts only so slightly, like
by a factor of two to 30, making the effective backwards threading limit 15,
the gcc.dg/pr69592.c -O2 compile-time explodes and you'll see

 backwards jump threading           :   9.17 ( 92%)   0.01 ( 20%)   9.18 ( 92%)
   24  (  0%)

note the testcase is somewhat degenerate with a series of diamonds also
"nicely" showing the backwards threader exponential behavior in exploring
the threading path space (plus ontop the quadraticness with starting on
every condition).  The current effective limit of 7 copied stmts limits the
effective thread length to a single diamond, avoiding the issue.

perf shows you

Samples: 143K of event 'cycles:u', Event count (approx.): 127516678963          
Overhead       Samples  Command  Shared Object     Symbol                       
  24.36%         34962  cc1      cc1               [.] bitmap_bit_p
  18.78%         26962  cc1      cc1               [.] bitmap_list_find_element
  14.98%         21505  cc1      cc1               [.] path_oracle::killing_def
   1.94%          2791  cc1      cc1               [.]
path_range_query::compute_ranges_in_block

so it's not the exponential search space per-se but the high overhead of
ranger, specifically the relation oracle which seems to be unbound.

path_range_query::range_defined_in_block calling path_oracle::killing_def
87 000 times gets you 600 000 000 bitmap_bit_p queries (resulting in
10 billion(!) bitmap list walk steps).

Part of the sub-optimality is probably the equiv chain becoming very long
(can we simply limit that?) and clearing bits in all the very many
bitmaps linked.  Not to say that linked lists (for relations and equivalences)
are hardly a good data structure for anything but inserts/removals :/

The bitmap (on the list) + linked list combos should probably be all replaced
with splay trees.  There's (unused) splay-tree-utils.h that seem to be
splay tree "adaptors" ontop of something with links, but libiberty splay-trees
of course work as well.  I think it's worth optimizing for small number of
elements, thus favor a balanced tree over hashing.

Note for the testcase at hand it's walking of the m_equiv list, not the
m_relations one so it might be a bit more difficult to fix this than
if the issue were the m_relations chain.

I'm also seeing missed micro-optimization like

  // Walk the equivalency list and remove SSA from any equivalencies.
  if (bitmap_bit_p (m_equiv.m_names, v))
...
  else
    bitmap_set_bit (m_equiv.m_names, v);

which can be written as

  if (!bitmap_set_bit (m_equiv.m_names, v))
....

likewise for bitmap_clear_bit.  Both return whether the bit changed
with the operation.  Of course that's just constant factor, the issue
here is complexity involving the linear list walks.

[Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query

Reply via email to