https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523
--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> --- The interesting bit is that there are only 12026 overall loglinks, the number of combine attempts is way higher than that would suggest also given the few successful combinations. So something is odd here. There's one interesting machinery in try_combine through added_links_insn and added_notes_insn we can end up re-processing a large swat of insns (even though we should need to only re-process the link target insns, not all insns inbetween). There might be the opportunity, for the "reprocessing", to use a worklist instead of resetting the insn walk. I added statistics to note the "distance" we travel there by taking DF_INSN_LUID (ret) - DF_INSN_LUID (added_{notes,links}_insn) as that. This shows 48 such jumps with seemingly large distances: 305 combine "restart earlier == 143" 3 305 combine "restart earlier == 254" 1 305 combine "restart earlier == 684" 1 305 combine "restart earlier == 726" 1 305 combine "restart earlier == 777" 1 305 combine "restart earlier == 1158" 1 305 combine "restart earlier == 1421" 1 305 combine "restart earlier == 2073" 1 305 combine "restart earlier == 2130" 1 ... 305 combine "restart earlier == 49717" 1 305 combine "restart earlier == 49763" 1 305 combine "restart earlier == 49866" 1 305 combine "restart earlier == 50010" 1 305 combine "restart earlier == 50286" 1 305 combine "restart earlier == 50754" 1 305 combine "restart earlier == 50809" 1 killing this feature doesn't improve things to -O1 levels though so it's more likely the fact that we also do rtx_insn *ret = newi2pat ? i2 : i3; thus re-start at i2 when we altered i2. We re-start through this 6910 times. Always re-starting at i3 helps a lot and gets us -O1 performance back. From comment#1 it suggests that newi2pat might in fact be equal to the old, so I tried to count how many times this happens with a stupid diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..acd176d3185 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4435,6 +4435,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, propagate_for_debug (i2, last_combined_insn, i2dest, i2src, this_basic_block); INSN_CODE (i2) = i2_code_number; + if (rtx_equal_p (PATTERN (i2), newi2pat)) + statistics_counter_event (cfun, "equal newi2pat", 1); PATTERN (i2) = newi2pat; } else and indeed this shows this to be the case 9211 times. The following improves compile-time to 20s and 460MB memory use. In general the algorithmic deficiency with the "restarting" remains and a proper fix is to use a worklist for those that you'd drain before advancing in the instruction chain (so not have a single 'ret' insn to reprocess but add to the worklist). I'm not sure whether identifying a not changed "new" i2 can be done better. I'll leave it all to Segher of course - he'll be fastest to produce something he likes. diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..0c61dcedaa1 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4276,6 +4276,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx _insn *i0, } } + bool newi2pat_not_new = false; { rtx i3notes, i2notes, i1notes = 0, i0notes = 0; struct insn_link *i3links, *i2links, *i1links = 0, *i0links = 0; @@ -4435,6 +4436,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, propagate_for_debug (i2, last_combined_insn, i2dest, i2src, this_basic_block); INSN_CODE (i2) = i2_code_number; + if (rtx_equal_p (PATTERN (i2), newi2pat)) + newi2pat_not_new = true; PATTERN (i2) = newi2pat; } else @@ -4752,7 +4755,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, combine_successes++; undo_commit (); - rtx_insn *ret = newi2pat ? i2 : i3; + rtx_insn *ret = newi2pat && !newi2pat_not_new ? i2 : i3; if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret)) ret = added_links_insn; if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret))