On 17/02/2025 16:15, Richard Sandiford wrote:
> Alex Coplan <alex.cop...@arm.com> writes:
> > Hi Richard,
> >
> > On 29/01/2025 16:44, Richard Sandiford wrote:
> >> The PR showed two issues with pair-fusion.  The first is that the pass
> >> treated stack pointer deallocations as ordinary register updates, and so
> >> might move them earlier than another stack access (through a different
> >> base register) that doesn't alias the pair candidate.
> >> 
> >> The simplest fix for that seems to be to prevent the stack deallocation
> >> from being moved.  This part might (or might not) be a latent source of
> >> wrong code and so worth backporting in some form.  (The patch as-is
> >> won't work for GCC 14.)
> >> 
> >> The second issue only started with r15-6551, which added a memory
> >> write to stack allocations and deallocations.  We should use the
> >> existing tombstone mechanism to preserve the associated memory
> >> definition.  (Deleting definitions immediately would have quadratic
> >> complexity in the worst case.)
> >
> > Thanks a lot for taking care of this while I was away.  I just have a
> > couple of comments below.
> >
> >> 
> >> Tested on aarch64-linux-gnu.  OK to install?
> >> 
> >> Richard
> >> 
> >> 
> >> gcc/
> >>    PR rtl-optimization/118429
> >>    * pair-fusion.cc (latest_hazard_before): Add an extra parameter
> >>    to say whether the instruction is a load or a store.  If the
> >>    instruction is not a load or store and has memory side effects,
> >>    prevent it from being moved earlier.
> >>    (pair_fusion::find_trailing_add): Update call accordingly.
> >>    (pair_fusion_bb_info::fuse_pair): If the trailng addition had
> >>    a memory side-effect, use a tombstone to preserve it.
> >> 
> >> gcc/testsuite/
> >>    PR rtl-optimization/118429
> >>    * gcc.c-torture/compile/pr118429.c: New test.
> >> ---
> >>  gcc/pair-fusion.cc                            | 45 ++++++++++++++-----
> >>  .../gcc.c-torture/compile/pr118429.c          |  7 +++
> >>  2 files changed, 40 insertions(+), 12 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr118429.c
> >> 
> >> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> >> index b6643ca4812..602e572ab6c 100644
> >> --- a/gcc/pair-fusion.cc
> >> +++ b/gcc/pair-fusion.cc
> >> @@ -573,11 +573,13 @@ pair_fusion_bb_info::track_access (insn_info *insn, 
> >> bool load_p, rtx mem)
> >>  // If IGNORE_INSN is non-NULL, we should further ignore any hazards 
> >> arising
> >>  // from that insn.
> >>  //
> >> -// N.B. we ignore any defs/uses of memory here as we deal with that 
> >> separately,
> >> -// making use of alias disambiguation.
> >> +// IS_LOAD_STORE is true if INSN is one of the loads or stores in the
> >> +// candidate pair.  We ignore any defs/uses of memory in such instructions
> >> +// as we deal with that separately, making use of alias disambiguation.
> >>  static insn_info *
> >>  latest_hazard_before (insn_info *insn, rtx *ignore,
> >> -                insn_info *ignore_insn = nullptr)
> >> +                insn_info *ignore_insn = nullptr,
> >> +                bool is_load_store = true)
> >>  {
> >>    insn_info *result = nullptr;
> >>  
> >> @@ -588,6 +590,10 @@ latest_hazard_before (insn_info *insn, rtx *ignore,
> >>        && find_reg_note (insn->rtl (), REG_EH_REGION, NULL_RTX))
> >>      return insn->prev_nondebug_insn ();
> >>  
> >> +  if (!is_load_store
> >> +      && accesses_include_memory (insn->defs ()))
> >> +    return insn->prev_nondebug_insn ();
> >
> > This seems like it might be a little too restrictive.  I agree that it's
> > a nice and simple way of solving the problem, but wouldn't it be enough
> > to prevent moving such accesses (stack deallocations) above the latest
> > preceding def or use of mem?  Certainly we don't want to start
> > attempting alias analysis here, but is the above suggestion not a happy
> > middle ground (between a simple solution and not overly restricting
> > optimisation)?
> 
> Would it help in practice though?  Although it is possible to combine
> a deallocation with preceding stores, that only happens for dead code,
> in which case the better optimisation is to delete the stores.
> If we're combining with loads, the loads would normally be restoring
> registers for the caller, in which case the loads could be moved
> forward to the deallocation (since nothing would use or clobber
> the loaded values between the two points).

I see.  I must admit that I don't immediately see why this can only
occur with dead stores, but I agree with your reasoning for the load
case, and it probably isn't worth spending much more time worrying about
optimising for such edge-case opportunities, anyway.  So all good :)

Thanks,
Alex

> 
> But yeah, I certainly don't object to doing that if you prefer.
> The patch is in now though (was asked to fix it while you were away),
> so it might be easier if you change it to a version that you're
> happier with (since you can self-approve it).
> 
> Thanks,
> Richard

Reply via email to