https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #39 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
I'm going to step away from this PR for a few weeks so I'll do a brain dump on
where I am, just in case someone wants to poke at it some more.

This problem in adjust_paths_after_duplication() boils down to a slow sub-path
search.  The current search is quadratic, but it can easily be remedied by
keeping the paths sorted.  In a patchset I will attach shortly we can
dramatically improve the performance by implementing the above:

** mainline
Time variable                                   usr           sys          wall
          GGC
 dominator optimization             : 161.07 (  9%)   0.65 (  8%) 162.00 (  9%)
  236M (  9%)
 backwards jump threading           :1386.84 ( 78%)   1.76 ( 22%)1393.97 ( 78%)
  519M ( 20%)
 TOTAL                              :1775.41          7.88       1789.76       
 2636M

** with patchset
torsion:~/bld/benchmark/tainted/gcc []$
Time variable                                   usr           sys          wall
          GGC
 dominator optimization             : 284.32 ( 46%)   1.39 ( 16%) 286.54 ( 46%)
  447M ( 13%)
 backwards jump threading           :  15.51 (  3%)   0.87 ( 10%)  16.40 (  3%)
  510M ( 15%)
 TOTAL                              : 616.15          8.81        626.82       
 3334M

The problem though is that the jump threader magic that reorders the basic
blocks once we know what paths to thread, is sensitive to the order of the
vector itself.  This affects all threaders, and it's code I'm unfamiliar with.

For example, for gcc.dg/tree-ssa/ssa-sink-13.c has the following thread paths
in the queue:

  [1] Registering jump thread: (2, 15) incoming edge;  (4, 5) nocopy; 
  [2] Registering jump thread: (2, 3) incoming edge;  (3, 4) normal (4, 6)
nocopy; 
  [3] Registering jump thread: (4, 6) incoming edge;  (6, 8) nocopy; 
  [4] Registering jump thread: (6, 8) incoming edge;  (8, 9) nocopy; 
  [5] Registering jump thread: (6, 7) incoming edge;  (7, 8) normal (8, 10)
nocopy; 
  [6] Registering jump thread: (8, 10) incoming edge;  (10, 12) nocopy; 
  [7] Registering jump thread: (10, 12) incoming edge;  (12, 13) nocopy; 
  [8] Registering jump thread: (10, 11) incoming edge;  (11, 12) normal (12,
14) nocopy; 

Once we process the first path, the code in back_jt_path_registry::update_cfg()
removes it with m_paths.ordered_remove (0).  The vector code cheaply moves the
path in position 8, to the first position leaving the vector like this:

  [1] Registering jump thread: (10, 11) incoming edge;  (11, 12) normal (12,
14) nocopy;
  [2] Registering jump thread: (2, 3) incoming edge;  (3, 4) normal (4, 6)
nocopy; 
  [3] Registering jump thread: (4, 6) incoming edge;  (6, 8) nocopy; 
  [4] Registering jump thread: (6, 8) incoming edge;  (8, 9) nocopy; 
  [5] Registering jump thread: (6, 7) incoming edge;  (7, 8) normal (8, 10)
nocopy; 
  [6] Registering jump thread: (8, 10) incoming edge;  (10, 12) nocopy; 
  [7] Registering jump thread: (10, 12) incoming edge;  (12, 13) nocopy; 

This simple change starts a cascade of transformations that ultimately causes
us to spectacularly fail ssa-sink-13.c.

So I have a patchset that could theoretically fix the performance issues in the
backward threader, but we first need to figure out why/if we care about the
ordering of the path vector affecting code generation.  Cause I'm afraid any
change we make is going to be affected by this.

I'll post patches next.

Reply via email to