On 17/02/2025 16:15, Richard Sandiford wrote: > Alex Coplan <alex.cop...@arm.com> writes: > > Hi Richard, > > > > On 29/01/2025 16:44, Richard Sandiford wrote: > >> The PR showed two issues with pair-fusion. The first is that the pass > >> treated stack pointer deallocations as ordinary register updates, and so > >> might move them earlier than another stack access (through a different > >> base register) that doesn't alias the pair candidate. > >> > >> The simplest fix for that seems to be to prevent the stack deallocation > >> from being moved. This part might (or might not) be a latent source of > >> wrong code and so worth backporting in some form. (The patch as-is > >> won't work for GCC 14.) > >> > >> The second issue only started with r15-6551, which added a memory > >> write to stack allocations and deallocations. We should use the > >> existing tombstone mechanism to preserve the associated memory > >> definition. (Deleting definitions immediately would have quadratic > >> complexity in the worst case.) > > > > Thanks a lot for taking care of this while I was away. I just have a > > couple of comments below. > > > >> > >> Tested on aarch64-linux-gnu. OK to install? > >> > >> Richard > >> > >> > >> gcc/ > >> PR rtl-optimization/118429 > >> * pair-fusion.cc (latest_hazard_before): Add an extra parameter > >> to say whether the instruction is a load or a store. If the > >> instruction is not a load or store and has memory side effects, > >> prevent it from being moved earlier. > >> (pair_fusion::find_trailing_add): Update call accordingly. > >> (pair_fusion_bb_info::fuse_pair): If the trailng addition had > >> a memory side-effect, use a tombstone to preserve it. > >> > >> gcc/testsuite/ > >> PR rtl-optimization/118429 > >> * gcc.c-torture/compile/pr118429.c: New test. > >> --- > >> gcc/pair-fusion.cc | 45 ++++++++++++++----- > >> .../gcc.c-torture/compile/pr118429.c | 7 +++ > >> 2 files changed, 40 insertions(+), 12 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr118429.c > >> > >> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc > >> index b6643ca4812..602e572ab6c 100644 > >> --- a/gcc/pair-fusion.cc > >> +++ b/gcc/pair-fusion.cc > >> @@ -573,11 +573,13 @@ pair_fusion_bb_info::track_access (insn_info *insn, > >> bool load_p, rtx mem) > >> // If IGNORE_INSN is non-NULL, we should further ignore any hazards > >> arising > >> // from that insn. > >> // > >> -// N.B. we ignore any defs/uses of memory here as we deal with that > >> separately, > >> -// making use of alias disambiguation. > >> +// IS_LOAD_STORE is true if INSN is one of the loads or stores in the > >> +// candidate pair. We ignore any defs/uses of memory in such instructions > >> +// as we deal with that separately, making use of alias disambiguation. > >> static insn_info * > >> latest_hazard_before (insn_info *insn, rtx *ignore, > >> - insn_info *ignore_insn = nullptr) > >> + insn_info *ignore_insn = nullptr, > >> + bool is_load_store = true) > >> { > >> insn_info *result = nullptr; > >> > >> @@ -588,6 +590,10 @@ latest_hazard_before (insn_info *insn, rtx *ignore, > >> && find_reg_note (insn->rtl (), REG_EH_REGION, NULL_RTX)) > >> return insn->prev_nondebug_insn (); > >> > >> + if (!is_load_store > >> + && accesses_include_memory (insn->defs ())) > >> + return insn->prev_nondebug_insn (); > > > > This seems like it might be a little too restrictive. I agree that it's > > a nice and simple way of solving the problem, but wouldn't it be enough > > to prevent moving such accesses (stack deallocations) above the latest > > preceding def or use of mem? Certainly we don't want to start > > attempting alias analysis here, but is the above suggestion not a happy > > middle ground (between a simple solution and not overly restricting > > optimisation)? > > Would it help in practice though? Although it is possible to combine > a deallocation with preceding stores, that only happens for dead code, > in which case the better optimisation is to delete the stores. > If we're combining with loads, the loads would normally be restoring > registers for the caller, in which case the loads could be moved > forward to the deallocation (since nothing would use or clobber > the loaded values between the two points).
I see. I must admit that I don't immediately see why this can only occur with dead stores, but I agree with your reasoning for the load case, and it probably isn't worth spending much more time worrying about optimising for such edge-case opportunities, anyway. So all good :) Thanks, Alex > > But yeah, I certainly don't object to doing that if you prefer. > The patch is in now though (was asked to fix it while you were away), > so it might be easier if you change it to a version that you're > happier with (since you can self-approve it). > > Thanks, > Richard