This series is a resubmission of the late-combine work. I've fixed some bugs that Jeff's cross-target CI found last time and some others that I hit since then.
I've also removed a source of quadraticness (oops!). Doing that in turn drove some tweaks to the rtl-ssa scan routines. The complexity of the new pass should be amortised O(n1 log(n2)), where n1 is the total number of input operands in the function and n2 is the number of instructions. The log(n2) component comes from searching call clobbers and is very much a worst case. We therefore shouldn't need a --param to limit the optimisation. I think the main comment from last time was that we should enable the pass by default on most targets. If there is a known reason why the pass doesn't work on a particular target, we should default to off for that specific target and file a bug to track the problem. The only targets that I know need to be handled in this way are i386, rs6000 and xtensa. See the covering note in the last patch for details. If the series is OK, I'll file PRs for those targets after pushing the patches. Tested on aarch64-linux-gnu and x86_64-linux-gnu (somewhat of a token gesture given the default-off for x86_64). Also tested by compiling one target per CPU directory and comparing the assembly output for parts of the GCC testsuite. This is just a way of getting a flavour of how the pass performs; it obviously isn't a meaningful benchmark. All targets seemed to improve on average, as described in the covering note to the last patch. The original motivation for the pass was to fix things like PR106594. However, it also helps to reclaim some of the optimisations that were lost in r15-268. Please let me know if there are some cases that the pass fails to reclaim. The series depends on Gui Haochen's insn_cost fix. OK to install? Thanks to Jeff for the help with testing the series. Richard Richard Sandiford (6): rtl-ssa: Rework _ignoring interfaces rtl-ssa: Don't cost no-op moves iq2000: Fix test and branch instructions sh: Make *minus_plus_one work after RA xstormy16: Fix xs_hi_nonmemory_operand Add a late-combine pass [PR106594] gcc/Makefile.in | 1 + gcc/common.opt | 5 + gcc/config/aarch64/aarch64-cc-fusion.cc | 4 +- gcc/config/i386/i386-options.cc | 4 + gcc/config/iq2000/iq2000.cc | 2 +- gcc/config/iq2000/iq2000.md | 4 +- gcc/config/rs6000/rs6000.cc | 8 + gcc/config/sh/sh.md | 6 +- gcc/config/stormy16/predicates.md | 2 +- gcc/config/xtensa/xtensa.cc | 11 + gcc/doc/invoke.texi | 11 +- gcc/doc/rtl.texi | 14 +- gcc/late-combine.cc | 747 ++++++++++++++++++ gcc/opts.cc | 1 + gcc/pair-fusion.cc | 34 +- gcc/passes.def | 2 + gcc/rtl-ssa.h | 1 + gcc/rtl-ssa/access-utils.h | 145 ++-- gcc/rtl-ssa/change-utils.h | 67 +- gcc/rtl-ssa/changes.cc | 6 +- gcc/rtl-ssa/changes.h | 13 - gcc/rtl-ssa/functions.h | 16 +- gcc/rtl-ssa/insn-utils.h | 8 - gcc/rtl-ssa/insns.cc | 7 +- gcc/rtl-ssa/insns.h | 12 - gcc/rtl-ssa/member-fns.inl | 35 +- gcc/rtl-ssa/movement.h | 118 ++- gcc/rtl-ssa/predicates.h | 58 ++ gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c | 2 +- gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c | 2 +- gcc/testsuite/gcc.dg/stack-check-4.c | 2 +- .../aarch64/bitfield-bitint-abi-align16.c | 2 +- .../aarch64/bitfield-bitint-abi-align8.c | 2 +- gcc/testsuite/gcc.target/aarch64/pr106594_1.c | 20 + .../gcc.target/aarch64/sve/cond_asrd_3.c | 10 +- .../gcc.target/aarch64/sve/cond_convert_3.c | 8 +- .../gcc.target/aarch64/sve/cond_convert_6.c | 8 +- .../gcc.target/aarch64/sve/cond_fabd_5.c | 11 +- .../gcc.target/aarch64/sve/cond_unary_4.c | 13 +- gcc/tree-pass.h | 1 + 40 files changed, 1127 insertions(+), 296 deletions(-) create mode 100644 gcc/late-combine.cc create mode 100644 gcc/rtl-ssa/predicates.h create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594_1.c -- 2.25.1