https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> ---
The interesting bit is that there are only 12026 overall loglinks, the
number of combine attempts is way higher than that would suggest also
given the few successful combinations.  So something is odd here.

There's one interesting machinery in try_combine through added_links_insn
and added_notes_insn we can end up re-processing a large swat of insns
(even though we should need to only re-process the link target insns, not
all insns inbetween).  There might be the opportunity, for the "reprocessing",
to use a worklist instead of resetting the insn walk.

I added statistics to note the "distance" we travel there by taking
DF_INSN_LUID (ret) - DF_INSN_LUID (added_{notes,links}_insn) as that.
This shows 48 such jumps with seemingly large distances:

305 combine "restart earlier == 143" 3
305 combine "restart earlier == 254" 1
305 combine "restart earlier == 684" 1
305 combine "restart earlier == 726" 1
305 combine "restart earlier == 777" 1
305 combine "restart earlier == 1158" 1
305 combine "restart earlier == 1421" 1
305 combine "restart earlier == 2073" 1
305 combine "restart earlier == 2130" 1
...
305 combine "restart earlier == 49717" 1
305 combine "restart earlier == 49763" 1
305 combine "restart earlier == 49866" 1
305 combine "restart earlier == 50010" 1
305 combine "restart earlier == 50286" 1
305 combine "restart earlier == 50754" 1
305 combine "restart earlier == 50809" 1

killing this feature doesn't improve things to -O1 levels though so it's
more likely the fact that we also do

  rtx_insn *ret = newi2pat ? i2 : i3;

thus re-start at i2 when we altered i2.  We re-start through this 6910
times.  Always re-starting at i3 helps a lot and gets us -O1 performance
back.  From comment#1 it suggests that newi2pat might in fact be equal
to the old, so I tried to count how many times this happens with a stupid

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..acd176d3185 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4435,6 +4435,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
          propagate_for_debug (i2, last_combined_insn, i2dest, i2src,
                               this_basic_block);
        INSN_CODE (i2) = i2_code_number;
+       if (rtx_equal_p (PATTERN (i2), newi2pat))
+         statistics_counter_event (cfun, "equal newi2pat", 1);
        PATTERN (i2) = newi2pat;
       }
     else

and indeed this shows this to be the case 9211 times.

The following improves compile-time to 20s and 460MB memory use.  In general
the algorithmic deficiency with the "restarting" remains and a proper fix
is to use a worklist for those that you'd drain before advancing in the
instruction chain (so not have a single 'ret' insn to reprocess but add
to the worklist).

I'm not sure whether identifying a not changed "new" i2 can be done better.
I'll leave it all to Segher of course - he'll be fastest to produce something
he likes.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..0c61dcedaa1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4276,6 +4276,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx
_insn *i0,
        }
     }

+  bool newi2pat_not_new = false;
   {
     rtx i3notes, i2notes, i1notes = 0, i0notes = 0;
     struct insn_link *i3links, *i2links, *i1links = 0, *i0links = 0;
@@ -4435,6 +4436,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
          propagate_for_debug (i2, last_combined_insn, i2dest, i2src,
                               this_basic_block);
        INSN_CODE (i2) = i2_code_number;
+       if (rtx_equal_p (PATTERN (i2), newi2pat))
+         newi2pat_not_new = true;
        PATTERN (i2) = newi2pat;
       }
     else
@@ -4752,7 +4755,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
   combine_successes++;
   undo_commit ();

-  rtx_insn *ret = newi2pat ? i2 : i3;
+  rtx_insn *ret = newi2pat && !newi2pat_not_new ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID
(ret))
     ret = added_links_insn;
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID
(ret))

Reply via email to