Hi Ajit,

On 17/05/2024 18:05, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 16/05/24 10:21 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > Thanks a lot for working through the review feedback.
> > 
> 
> Thanks a lot for reviewing the code and approving the patch.

To be clear, I didn't approve the patch because I can't, I just said
that it looks good to me.  You need an AArch64 maintainer (probably
Richard S) to approve it.

> 
> > The patch LGTM with the two minor suggested changes below.  I can't
> > approve the patch, though, so you'll need an OK from Richard S.
> > 
> > Also, I'm not sure if it makes sense to apply the patch in isolation, it
> > might make more sense to only apply it in series with follow-up patches to:
> >  - Finish renaming any bits of the generic code that need renaming (I
> >    guess we'll want to rename at least ldp_bb_info to something else,
> >    probably there are other bits too).
> >  - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
> >    middle-end.
> >
> 
> Addressed in separate patch sent.

Hmm, that doens't look right.  You sent a single patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652028.html
which looks to squash the work you've done in this patch together with
the move.

What I expect to see is a patch series, as follows:

[PATCH 1/3] aarch64: Split generic code from aarch64 code in ldp fusion
[PATCH 2/3] aarch64: Further renaming of generic code
[PATCH 3/3] aarch64, middle-end: Move pair_fusion pass from aarch64 to 
middle-end

where 1/3 is exactly the patch that I reviewed above with the two
(minor) requested changes (plus any changes requested by Richard), 2/3
(optionally) does further renaming to use generic terminology in the
generic code where needed/desired, and 3/3 does a straight cut/paste
move of code into pair-fusion.h and pair-fusion.cc, with no other
changes (save for perhaps a Makefile change and adding an include in
aarch64-ldp-fusion.cc).

Arguably you could split this even further and do the move of the
pair_fusion class to the new header in a separate patch prior to the
final move.

N.B. (IMO) the patches should be presented like this both for review and
(if approved) when committing.

Richard S may have further suggestions on how to split the patches /
make them more tractable to review, I think this is the bare minimum
that is needed though.

Hope that makes sense.

Thanks,
Alex

>  
> > I'll let Richard S make the final judgement on that.  I don't really
> > mind either way.
> 
> Sure.
> 
> Thanks & Regards
> Ajit
> > 
> > On 15/05/2024 15:06, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are addressed.
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface between target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> Bootstrapped and regtested on aarch64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> aarch64: Preparatory patch to place target independent and
> >> dependent changed code in one file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-05-15  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
> >>
> >> gcc/ChangeLog:
> >>
> >>    * config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>    independent and dependent changed code.
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++++++++++++++--------
> >>  1 file changed, 357 insertions(+), 176 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 1d9caeab05d..429e532ea3b 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,6 +138,225 @@ struct alt_base
> >>    poly_int64 offset;
> >>  };
> >>  
> >> +// Virtual base class for load/store walkers used in alias analysis.
> >> +struct alias_walker
> >> +{
> >> +  virtual bool conflict_p (int &budget) const = 0;
> >> +  virtual insn_info *insn () const = 0;
> >> +  virtual bool valid () const = 0;
> >> +  virtual void advance () = 0;
> >> +};
> >> +
> >> +// When querying handle_writeback_opportunities, this enum is used to
> >> +// qualify which opportunities we are asking about.
> >> +enum class writeback {
> >> +  // Only those writeback opportunities that arise from existing
> >> +  // auto-increment accesses.
> >> +  EXISTING,
> > 
> > Very minor nit: I think an extra blank line here would be nice for 
> > readability
> > now that the enumerators have comments above.
> > 
> >> +  // All writeback opportunities including those that involve folding
> >> +  // base register updates into a non-writeback pair.
> >> +  ALL
> >> +};
> >> +
> > 
> > Can we have a block comment here which describes the purpose of the
> > class and how it fits together with the target?  Something like the
> > following would do:
> > 
> > // This class can be overriden by targets to give a pass that fuses
> > // adjacent loads and stores into load/store pair instructions.
> > //
> > // The target can override the various virtual functions to customize
> > // the behaviour of the pass as appropriate for the target.
> > 
> >> +struct pair_fusion {
> >> +  pair_fusion ()
> >> +  {
> >> +    calculate_dominance_info (CDI_DOMINATORS);
> >> +    df_analyze ();
> >> +    crtl->ssa = new rtl_ssa::function_info (cfun);
> >> +  };
> >> +
> >> +  // Given:
> >> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> >> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> >> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> >> +  // return true if the access should be considered an FP/SIMD access.
> >> +  // Such accesses are segregated from GPR accesses, since we only want
> >> +  // to form pairs for accesses that use the same register file.
> >> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> >> +  {
> >> +    return false;
> >> +  }
> >> +
> >> +  // Return true if we should consider forming pairs from memory
> >> +  // accesses with operand mode MODE at this stage in compilation.
> >> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> >> +
> >> +  // Return true iff REG_OP is a suitable register operand for a paired
> >> +  // memory access, where LOAD_P is true if we're asking about loads and
> >> +  // false for stores.  MODE gives the mode of the operand.
> >> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> >> +                                machine_mode mode) = 0;
> >> +
> >> +  // Return alias check limit.
> >> +  // This is needed to avoid unbounded quadratic behaviour when
> >> +  // performing alias analysis.
> >> +  virtual int pair_mem_alias_check_limit () = 0;
> >> +
> >> +  // Returns true if we should try to handle writeback opportunities.
> >> +  // WHICH determines the kinds of writeback opportunities the caller
> >> +  // is asking about.
> >> +  virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;
> >> +
> >> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
> >> +  // and LOAD_P (true if the access is a load), check if we should proceed
> >> +  // to form the pair given the target's code generation policy on
> >> +  // paired accesses.
> >> +  virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
> >> +
> >> +  // Generate the pattern for a paired access.  PATS gives the patterns
> >> +  // for the individual memory accesses (which by this point must share a
> >> +  // common base register).  If WRITEBACK is non-NULL, then this rtx
> >> +  // describes the update to the base register that should be performed by
> >> +  // the resulting insn.  LOAD_P is true iff the accesses are loads.
> >> +  virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
> >> +
> >> +  // Return true if INSN is a paired memory access.  If so, set LOAD_P to
> >> +  // true iff INSN is a load pair.
> >> +  virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
> >> +
> >> +  // Return true if we should track loads.
> >> +  virtual bool track_loads_p ()
> >> +  {
> >> +    return true;
> >> +  }
> >> +
> >> +  // Return true if we should track stores.
> >> +  virtual bool track_stores_p ()
> >> +  {
> >> +    return true;
> >> +  }
> >> +
> >> +  // Return true if OFFSET is in range for a paired memory access.
> >> +  virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
> >> +
> >> +  // Given a load/store pair insn in PATTERN, unpack the insn, storing
> >> +  // the register operands in REGS, and returning the mem.  LOAD_P is
> >> +  // true for loads and false for stores.
> >> +  virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 
> >> 0;
> >> +
> >> +  // Given a pair mem in MEM, register operands in REGS, and an rtx
> >> +  // representing the effect of writeback on the base register in 
> >> WB_EFFECT,
> >> +  // return an insn representing a writeback variant of this pair.
> >> +  // LOAD_P is true iff the pair is a load.
> >> +  // This is used when promoting existing non-writeback pairs to writeback
> >> +  // variants.
> >> +  virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
> >> +                                    rtx regs[2], bool load_p) = 0;
> >> +
> >> +  void ldp_fusion_bb (bb_info *bb);
> >> +  inline insn_info *find_trailing_add (insn_info *insns[2],
> >> +                                 const insn_range_info &pair_range,
> >> +                                 int initial_writeback,
> >> +                                 rtx *writeback_effect,
> >> +                                 def_info **add_def,
> >> +                                 def_info *base_def,
> >> +                                 poly_int64 initial_offset,
> >> +                                 unsigned access_size);
> >> +  inline int get_viable_bases (insn_info *insns[2],
> >> +                         vec<base_cand> &base_cands,
> >> +                         rtx cand_mems[2],
> >> +                         unsigned access_size,
> >> +                         bool reversed);
> >> +  inline void do_alias_analysis (insn_info *alias_hazards[4],
> >> +                           alias_walker *walkers[4],
> >> +                           bool load_p);
> >> +  inline void try_promote_writeback (insn_info *insn, bool load_p);
> >> +  inline void run ();
> >> +  ~pair_fusion ()
> >> +  {
> >> +    if (crtl->ssa->perform_pending_updates ())
> >> +      cleanup_cfg (0);
> >> +
> >> +    free_dominance_info (CDI_DOMINATORS);
> >> +
> >> +    delete crtl->ssa;
> >> +    crtl->ssa = nullptr;
> >> +  }
> >> +};
> >> +
> >> +// This is the main function to start the pass.
> >> +void
> >> +pair_fusion::run ()
> >> +{
> >> +  if (!track_loads_p () && !track_stores_p ())
> >> +    return;
> >> +
> >> +  for (auto bb : crtl->ssa->bbs ())
> >> +    ldp_fusion_bb (bb);
> >> +}
> >> +
> >> +struct aarch64_pair_fusion : public pair_fusion
> >> +{
> >> +  bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >> +              bool load_p) override final
> >> +  {
> >> +    // Before RA, we use the modes, noting that stores of constant zero
> >> +    // operands use GPRs (even in non-integer modes).  After RA, we use
> >> +    // the hard register numbers.
> >> +    return reload_completed
> >> +      ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> >> +      : (GET_MODE_CLASS (mem_mode) != MODE_INT
> >> +   && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> >> +  }
> >> +
> >> +  bool pair_mem_insn_p (rtx_insn *rti, bool &load_p) override final;
> >> +
> >> +  bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) override final
> >> +  {
> >> +    return aarch64_mem_ok_with_ldpstp_policy_model (base_mem,
> >> +                                              load_p,
> >> +                                              GET_MODE (base_mem));
> >> +  }
> >> +
> >> +  bool pair_operand_mode_ok_p (machine_mode mode) override final;
> >> +
> >> +  rtx gen_pair (rtx *pats, rtx writeback, bool load_p) override final;
> >> +
> >> +  bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> >> +                        machine_mode mode) override final
> >> +  {
> >> +    return (load_p
> >> +      ? aarch64_ldp_reg_operand (reg_op, mode)
> >> +      : aarch64_stp_reg_operand (reg_op, mode));
> >> +  }
> >> +
> >> +  int pair_mem_alias_check_limit () override final
> >> +  {
> >> +    return aarch64_ldp_alias_check_limit;
> >> +  }
> >> +
> >> +  bool handle_writeback_opportunities (enum writeback which) override 
> >> final
> >> +  {
> >> +    if (which == writeback::ALL)
> >> +      return aarch64_ldp_writeback > 1;
> >> +    else
> >> +      return aarch64_ldp_writeback;
> >> +  }
> >> +
> >> +  bool track_loads_p () override final
> >> +  {
> >> +    return aarch64_tune_params.ldp_policy_model
> >> +     != AARCH64_LDP_STP_POLICY_NEVER;
> >> +  }
> >> +
> >> +  bool track_stores_p () override final
> >> +  {
> >> +    return aarch64_tune_params.stp_policy_model
> >> +     != AARCH64_LDP_STP_POLICY_NEVER;
> >> +  }
> >> +
> >> +  bool pair_mem_in_range_p (HOST_WIDE_INT offset) override final
> >> +  {
> >> +    return (offset >= LDP_MIN_IMM && offset <= LDP_MAX_IMM);
> >> +  }
> >> +
> >> +  rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, rtx regs[2],
> >> +                            bool load_p) override final;
> >> +
> >> +  rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) override 
> >> final;
> >> +};
> >> +
> >>  // State used by the pass for a given basic block.
> >>  struct ldp_bb_info
> >>  {
> >> @@ -159,9 +378,9 @@ struct ldp_bb_info
> >>    hash_map<def_hash, alt_base> canon_base_map;
> >>  
> >>    static const size_t obstack_alignment = sizeof (void *);
> >> -  bb_info *m_bb;
> >>  
> >> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> >> +  ldp_bb_info (bb_info *bb, pair_fusion *d)
> >> +         : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
> >>    {
> >>      obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
> >>                            obstack_alignment, obstack_chunk_alloc,
> >> @@ -184,6 +403,8 @@ struct ldp_bb_info
> >>  
> >>  private:
> >>    obstack m_obstack;
> >> +  bb_info *m_bb;
> >> +  pair_fusion *m_pass;
> >>  
> >>    // State for keeping track of tombstone insns emitted for this BB.
> >>    bitmap_obstack m_bitmap_obstack;
> >> @@ -213,6 +434,45 @@ private:
> >>    inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
> >>  };
> >>  
> >> +bool
> >> +aarch64_pair_fusion::pair_mem_insn_p (rtx_insn *rti, bool &load_p)
> >> +{
> >> +  rtx pat = PATTERN (rti);
> >> +  if (GET_CODE (pat) == PARALLEL
> >> +      && XVECLEN (pat, 0) == 2)
> >> +    {
> >> +      const auto attr = get_attr_ldpstp (rti);
> >> +      if (attr == LDPSTP_NONE)
> >> +  return false;
> >> +
> >> +      load_p = (attr == LDPSTP_LDP);
> >> +      gcc_checking_assert (load_p || attr == LDPSTP_STP);
> >> +      return true;
> >> +    }
> >> +  return false;
> >> +}
> >> +
> >> +rtx
> >> +aarch64_pair_fusion::gen_pair (rtx *pats, rtx writeback, bool load_p)
> >> +{
> >> +  rtx pair_pat;
> >> +
> >> +  if (writeback)
> >> +    {
> >> +      auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
> >> +      return gen_rtx_PARALLEL (VOIDmode, patvec);
> >> +    }
> >> +  else if (load_p)
> >> +    return aarch64_gen_load_pair (XEXP (pats[0], 0),
> >> +                            XEXP (pats[1], 0),
> >> +                            XEXP (pats[0], 1));
> >> +  else
> >> +    return aarch64_gen_store_pair (XEXP (pats[0], 0),
> >> +                             XEXP (pats[0], 1),
> >> +                             XEXP (pats[1], 1));
> >> +  return pair_pat;
> >> +}
> >> +
> >>  splay_tree_node<access_record *> *
> >>  ldp_bb_info::node_alloc (access_record *access)
> >>  {
> >> @@ -312,8 +572,8 @@ any_post_modify_p (rtx x)
> >>  
> >>  // Return true if we should consider forming ldp/stp insns from memory
> >>  // accesses with operand mode MODE at this stage in compilation.
> >> -static bool
> >> -ldp_operand_mode_ok_p (machine_mode mode)
> >> +bool
> >> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode)
> >>  {
> >>    if (!aarch64_ldpstp_operand_mode_p (mode))
> >>      return false;
> >> @@ -404,9 +664,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx 
> >> mem, lfs_fields lfs)
> >>    const machine_mode mem_mode = GET_MODE (mem);
> >>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
> >>  
> >> -  // Punt on misaligned offsets.  LDP/STP instructions require offsets to 
> >> be a
> >> -  // multiple of the access size, and we believe that misaligned offsets 
> >> on
> >> -  // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL 
> >> bases.
> >> +  // Punt on misaligned offsets.  Paired memory access instructions 
> >> require
> >> +  // offsets to be a multiple of the access size, and we believe that
> >> +  // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
> >> +  // offsets w.r.t. RTL bases.
> >>    if (!multiple_p (offset, mem_size))
> >>      return false;
> >>  
> >> @@ -430,10 +691,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, 
> >> rtx mem, lfs_fields lfs)
> >>  }
> >>  
> >>  // Main function to begin pair discovery.  Given a memory access INSN,
> >> -// determine whether it could be a candidate for fusing into an ldp/stp,
> >> -// and if so, track it in the appropriate data structure for this basic
> >> -// block.  LOAD_P is true if the access is a load, and MEM is the mem
> >> -// rtx that occurs in INSN.
> >> +// determine whether it could be a candidate for fusing into a paired
> >> +// access, and if so, track it in the appropriate data structure for
> >> +// this basic block.  LOAD_P is true if the access is a load, and MEM
> >> +// is the mem rtx that occurs in INSN.
> >>  void
> >>  ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> >>  {
> >> @@ -441,35 +702,24 @@ ldp_bb_info::track_access (insn_info *insn, bool 
> >> load_p, rtx mem)
> >>    if (MEM_VOLATILE_P (mem))
> >>      return;
> >>  
> >> -  // Ignore writeback accesses if the param says to do so.
> >> -  if (!aarch64_ldp_writeback
> >> +  // Ignore writeback accesses if the hook says to do so.
> >> +  if (!m_pass->handle_writeback_opportunities (writeback::EXISTING)
> >>        && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
> >>      return;
> >>  
> >>    const machine_mode mem_mode = GET_MODE (mem);
> >> -  if (!ldp_operand_mode_ok_p (mem_mode))
> >> +  if (!m_pass->pair_operand_mode_ok_p (mem_mode))
> >>      return;
> >>  
> >>    rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
> >>  
> >> -  // Ignore the access if the register operand isn't suitable for ldp/stp.
> >> -  if (load_p
> >> -      ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
> >> -      : !aarch64_stp_reg_operand (reg_op, mem_mode))
> >> +  if (!m_pass->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
> >>      return;
> >>  
> >>    // We want to segregate FP/SIMD accesses from GPR accesses.
> >> -  //
> >> -  // Before RA, we use the modes, noting that stores of constant zero
> >> -  // operands use GPRs (even in non-integer modes).  After RA, we use
> >> -  // the hard register numbers.
> >> -  const bool fpsimd_op_p
> >> -    = reload_completed
> >> -    ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> >> -    : (GET_MODE_CLASS (mem_mode) != MODE_INT
> >> -       && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> >> -
> >> -  // Note ldp_operand_mode_ok_p already rejected VL modes.
> >> +  const bool fpsimd_op_p = m_pass->fpsimd_op_p (reg_op, mem_mode, load_p);
> >> +
> >> +  // Note pair_operand_mode_ok_p already rejected VL modes.
> >>    const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
> >>    const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
> >>  
> >> @@ -498,8 +748,8 @@ ldp_bb_info::track_access (insn_info *insn, bool 
> >> load_p, rtx mem)
> >>    // elimination offset pre-RA, we should postpone forming pairs on such
> >>    // accesses until after RA.
> >>    //
> >> -  // As it stands, addresses with offsets in range for LDR but not
> >> -  // in range for LDP/STP are currently reloaded inefficiently,
> >> +  // As it stands, addresses in range for an individual load/store but not
> >> +  // for a paired access are currently reloaded inefficiently,
> >>    // ending up with a separate base register for each pair.
> >>    //
> >>    // In theory LRA should make use of
> >> @@ -510,11 +760,11 @@ ldp_bb_info::track_access (insn_info *insn, bool 
> >> load_p, rtx mem)
> >>    // constraint; this early return means we never get to the code
> >>    // that calls targetm.legitimize_address_displacement.
> >>    //
> >> -  // So for now, it's better to punt when we can't be sure that the
> >> -  // offset is in range for LDP/STP.  Out-of-range cases can then be
> >> -  // handled after RA by the out-of-range LDP/STP peepholes.  Eventually, 
> >> it
> >> -  // would be nice to handle known out-of-range opportunities in the
> >> -  // pass itself (for stack accesses, this would be in the post-RA pass).
> >> +  // It's better to punt when we can't be sure that the
> >> +  // offset is in range for paired access.  On aarch64, Out-of-range cases
> >> +  // can then be handled after RA by the out-of-range LDP/STP peepholes.
> >> +  // Eventually, it would be nice to handle known out-of-range 
> >> opportunities
> >> +  // in the pass itself (for stack accesses, this would be in the post-RA 
> >> pass).
> >>    if (!reload_completed
> >>        && (REGNO (base) == FRAME_POINTER_REGNUM
> >>      || REGNO (base) == ARG_POINTER_REGNUM))
> >> @@ -565,8 +815,8 @@ ldp_bb_info::track_access (insn_info *insn, bool 
> >> load_p, rtx mem)
> >>    gcc_unreachable (); // Base defs should be unique.
> >>      }
> >>  
> >> -  // Punt on misaligned offsets.  LDP/STP require offsets to be a 
> >> multiple of
> >> -  // the access size.
> >> +  // Punt on misaligned offsets.  Paired memory accesses require offsets
> >> +  // to be a multiple of the access size.
> >>    if (!multiple_p (mem_off, mem_size))
> >>      return;
> >>  
> >> @@ -1199,8 +1449,8 @@ extract_writebacks (bool load_p, rtx pats[2], int 
> >> changed)
> >>  // base register.  If there is one, we choose the first such update after
> >>  // PAIR_DST that is still in the same BB as our pair.  We return the new 
> >> def in
> >>  // *ADD_DEF and the resulting writeback effect in *WRITEBACK_EFFECT.
> >> -static insn_info *
> >> -find_trailing_add (insn_info *insns[2],
> >> +insn_info *
> >> +pair_fusion::find_trailing_add (insn_info *insns[2],
> >>               const insn_range_info &pair_range,
> >>               int initial_writeback,
> >>               rtx *writeback_effect,
> >> @@ -1278,7 +1528,7 @@ find_trailing_add (insn_info *insns[2],
> >>  
> >>    off_hwi /= access_size;
> >>  
> >> -  if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
> >> +  if (!pair_mem_in_range_p (off_hwi))
> >>      return nullptr;
> >>  
> >>    auto dump_prefix = [&]()
> >> @@ -1792,7 +2042,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>      {
> >>        if (dump_file)
> >>    fprintf (dump_file,
> >> -           "  ldp: i%d has wb but subsequent i%d has non-wb "
> >> +           "  load pair: i%d has wb but subsequent i%d has non-wb "
> >>             "update of base (r%d), dropping wb\n",
> >>             insns[0]->uid (), insns[1]->uid (), base_regno);
> >>        gcc_assert (writeback_effect);
> >> @@ -1815,9 +2065,9 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>      }
> >>  
> >>    // If either of the original insns had writeback, but the resulting 
> >> pair insn
> >> -  // does not (can happen e.g. in the ldp edge case above, or if the 
> >> writeback
> >> -  // effects cancel out), then drop the def(s) of the base register as
> >> -  // appropriate.
> >> +  // does not (can happen e.g. in the load pair edge case above, or if the
> >> +  // writeback effects cancel out), then drop the def (s) of the base 
> >> register
> >> +  // as appropriate.
> >>    //
> >>    // Also drop the first def in the case that both of the original insns 
> >> had
> >>    // writeback.  The second def could well have uses, but the first def 
> >> should
> >> @@ -1834,7 +2084,7 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>    // update of the base register and try and fold it in to make this into 
> >> a
> >>    // writeback pair.
> >>    insn_info *trailing_add = nullptr;
> >> -  if (aarch64_ldp_writeback > 1
> >> +  if (m_pass->handle_writeback_opportunities (writeback::ALL)
> >>        && !writeback_effect
> >>        && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
> >>                                     XEXP (pats[0], 0), nullptr)
> >> @@ -1842,10 +2092,10 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>                                         XEXP (pats[1], 0), nullptr))))
> >>      {
> >>        def_info *add_def;
> >> -      trailing_add = find_trailing_add (insns, move_range, writeback,
> >> -                                  &writeback_effect,
> >> -                                  &add_def, base.def, offsets[0],
> >> -                                  access_size);
> >> +      trailing_add = m_pass->find_trailing_add (insns, move_range, 
> >> writeback,
> >> +                                          &writeback_effect,
> >> +                                          &add_def, base.def, offsets[0],
> >> +                                          access_size);
> >>        if (trailing_add)
> >>    {
> >>      // The def of the base register from the trailing add should prevail.
> >> @@ -1855,35 +2105,20 @@ ldp_bb_info::fuse_pair (bool load_p,
> >>      }
> >>  
> >>    // Now that we know what base mem we're going to use, check if it's OK
> >> -  // with the ldp/stp policy.
> >> +  // with the pair mem policy.
> >>    rtx first_mem = XEXP (pats[0], load_p);
> >> -  if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
> >> -                                          load_p,
> >> -                                          GET_MODE (first_mem)))
> >> +  if (!m_pass->pair_mem_ok_with_policy (first_mem, load_p))
> >>      {
> >>        if (dump_file)
> >> -  fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
> >> +  fprintf (dump_file,
> >> +           "punting on pair (%d,%d), pair mem policy says no\n",
> >>             i1->uid (), i2->uid ());
> >>        return false;
> >>      }
> >>  
> >>    rtx reg_notes = combine_reg_notes (first, second, load_p);
> >>  
> >> -  rtx pair_pat;
> >> -  if (writeback_effect)
> >> -    {
> >> -      auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
> >> -      pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
> >> -    }
> >> -  else if (load_p)
> >> -    pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
> >> -                                XEXP (pats[1], 0),
> >> -                                XEXP (pats[0], 1));
> >> -  else
> >> -    pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
> >> -                                 XEXP (pats[0], 1),
> >> -                                 XEXP (pats[1], 1));
> >> -
> >> +  rtx pair_pat = m_pass->gen_pair (pats, writeback_effect, load_p);
> >>    insn_change *pair_change = nullptr;
> >>    auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
> >>      rtx_insn *rti = change->insn ()->rtl ();
> >> @@ -2125,15 +2360,6 @@ load_modified_by_store_p (insn_info *load,
> >>    return false;
> >>  }
> >>  
> >> -// Virtual base class for load/store walkers used in alias analysis.
> >> -struct alias_walker
> >> -{
> >> -  virtual bool conflict_p (int &budget) const = 0;
> >> -  virtual insn_info *insn () const = 0;
> >> -  virtual bool valid () const = 0;
> >> -  virtual void advance () = 0;
> >> -};
> >> -
> >>  // Implement some common functionality used by both store_walker
> >>  // and load_walker.
> >>  template<bool reverse>
> >> @@ -2251,13 +2477,13 @@ public:
> >>  //
> >>  // We try to maintain the invariant that if a walker becomes invalid, we
> >>  // set its pointer to null.
> >> -static void
> >> -do_alias_analysis (insn_info *alias_hazards[4],
> >> -             alias_walker *walkers[4],
> >> -             bool load_p)
> >> +void
> >> +pair_fusion::do_alias_analysis (insn_info *alias_hazards[4],
> >> +                          alias_walker *walkers[4],
> >> +                          bool load_p)
> >>  {
> >>    const int n_walkers = 2 + (2 * !load_p);
> >> -  int budget = aarch64_ldp_alias_check_limit;
> >> +  int budget = pair_mem_alias_check_limit ();
> >>  
> >>    auto next_walker = [walkers,n_walkers](int current) -> int {
> >>      for (int j = 1; j <= n_walkers; j++)
> >> @@ -2342,12 +2568,12 @@ do_alias_analysis (insn_info *alias_hazards[4],
> >>  //
> >>  // Returns an integer where bit (1 << i) is set if INSNS[i] uses writeback
> >>  // addressing.
> >> -static int
> >> -get_viable_bases (insn_info *insns[2],
> >> -            vec<base_cand> &base_cands,
> >> -            rtx cand_mems[2],
> >> -            unsigned access_size,
> >> -            bool reversed)
> >> +int
> >> +pair_fusion::get_viable_bases (insn_info *insns[2],
> >> +                         vec<base_cand> &base_cands,
> >> +                         rtx cand_mems[2],
> >> +                         unsigned access_size,
> >> +                         bool reversed)
> >>  {
> >>    // We discovered this pair through a common base.  Need to ensure that
> >>    // we have a common base register that is live at both locations.
> >> @@ -2389,7 +2615,7 @@ get_viable_bases (insn_info *insns[2],
> >>        if (!is_lower)
> >>    base_off--;
> >>  
> >> -      if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
> >> +      if (!pair_mem_in_range_p (base_off))
> >>    continue;
> >>  
> >>        use_info *use = find_access (insns[i]->uses (), REGNO (base));
> >> @@ -2446,7 +2672,7 @@ get_viable_bases (insn_info *insns[2],
> >>  }
> >>  
> >>  // Given two adjacent memory accesses of the same size, I1 and I2, try
> >> -// and see if we can merge them into a ldp or stp.
> >> +// and see if we can merge them into a paired access.
> >>  //
> >>  // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
> >>  // if the accesses are both loads, otherwise they are both stores.
> >> @@ -2486,7 +2712,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
> >> access_size,
> >>      {
> >>        if (dump_file)
> >>    fprintf (dump_file,
> >> -           "punting on ldp due to reg conflcits (%d,%d)\n",
> >> +           "punting on load pair due to reg conflcits (%d,%d)\n",
> >>             insns[0]->uid (), insns[1]->uid ());
> >>        return false;
> >>      }
> >> @@ -2504,8 +2730,8 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
> >> access_size,
> >>  
> >>    auto_vec<base_cand, 2> base_cands (2);
> >>  
> >> -  int writeback = get_viable_bases (insns, base_cands, cand_mems,
> >> -                              access_size, reversed);
> >> +  int writeback = m_pass->get_viable_bases (insns, base_cands, cand_mems,
> >> +                                      access_size, reversed);
> >>    if (base_cands.is_empty ())
> >>      {
> >>        if (dump_file)
> >> @@ -2633,7 +2859,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
> >> access_size,
> >>      walkers[1] = &backward_store_walker;
> >>  
> >>    if (load_p && (mem_defs[0] || mem_defs[1]))
> >> -    do_alias_analysis (alias_hazards, walkers, load_p);
> >> +    m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
> >>    else
> >>      {
> >>        // We want to find any loads hanging off the first store.
> >> @@ -2642,7 +2868,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned 
> >> access_size,
> >>        load_walker<true> backward_load_walker (mem_defs[1], insns[1], 
> >> insns[0]);
> >>        walkers[2] = &forward_load_walker;
> >>        walkers[3] = &backward_load_walker;
> >> -      do_alias_analysis (alias_hazards, walkers, load_p);
> >> +      m_pass->do_alias_analysis (alias_hazards, walkers, load_p);
> >>        // Now consolidate hazards back down.
> >>        if (alias_hazards[2]
> >>      && (!alias_hazards[0] || (*alias_hazards[2] < *alias_hazards[0])))
> >> @@ -2956,26 +3182,6 @@ ldp_bb_info::transform ()
> >>    traverse_base_map (def_map);
> >>  }
> >>  
> >> -static void
> >> -ldp_fusion_init ()
> >> -{
> >> -  calculate_dominance_info (CDI_DOMINATORS);
> >> -  df_analyze ();
> >> -  crtl->ssa = new rtl_ssa::function_info (cfun);
> >> -}
> >> -
> >> -static void
> >> -ldp_fusion_destroy ()
> >> -{
> >> -  if (crtl->ssa->perform_pending_updates ())
> >> -    cleanup_cfg (0);
> >> -
> >> -  free_dominance_info (CDI_DOMINATORS);
> >> -
> >> -  delete crtl->ssa;
> >> -  crtl->ssa = nullptr;
> >> -}
> >> -
> >>  // Given a load pair insn in PATTERN, unpack the insn, storing
> >>  // the registers in REGS and returning the mem.
> >>  static rtx
> >> @@ -3015,16 +3221,19 @@ aarch64_destructure_store_pair (rtx regs[2], rtx 
> >> pattern)
> >>    return mem;
> >>  }
> >>  
> >> -// Given a pair mem in PAIR_MEM, register operands in REGS, and an rtx
> >> -// representing the effect of writeback on the base register in WB_EFFECT,
> >> -// return an insn representing a writeback variant of this pair.
> >> -// LOAD_P is true iff the pair is a load.
> >> -//
> >> -// This is used when promoting existing non-writeback pairs to writeback
> >> -// variants.
> >> -static rtx
> >> -aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
> >> -                      bool load_p)
> >> +rtx
> >> +aarch64_pair_fusion::destructure_pair (rtx regs[2], rtx pattern, bool 
> >> load_p)
> >> +{
> >> +  if (load_p)
> >> +    return aarch64_destructure_load_pair (regs, pattern);
> >> +  else
> >> +    return aarch64_destructure_store_pair (regs, pattern);
> >> +}
> >> +
> >> +rtx
> >> +aarch64_pair_fusion::gen_promote_writeback_pair (rtx wb_effect, rtx 
> >> pair_mem,
> >> +                                           rtx regs[2],
> >> +                                           bool load_p)
> >>  {
> >>    auto op_mode = aarch64_operand_mode_for_pair_mode (GET_MODE (pair_mem));
> >>  
> >> @@ -3059,23 +3268,12 @@ aarch64_gen_writeback_pair (rtx wb_effect, rtx 
> >> pair_mem, rtx regs[2],
> >>  // Given an existing pair insn INSN, look for a trailing update of
> >>  // the base register which we can fold in to make this pair use
> >>  // a writeback addressing mode.
> >> -static void
> >> -try_promote_writeback (insn_info *insn)
> >> +void
> >> +pair_fusion::try_promote_writeback (insn_info *insn, bool load_p)
> >>  {
> >> -  auto rti = insn->rtl ();
> >> -  const auto attr = get_attr_ldpstp (rti);
> >> -  if (attr == LDPSTP_NONE)
> >> -    return;
> >> -
> >> -  bool load_p = (attr == LDPSTP_LDP);
> >> -  gcc_checking_assert (load_p || attr == LDPSTP_STP);
> >> -
> >>    rtx regs[2];
> >> -  rtx mem = NULL_RTX;
> >> -  if (load_p)
> >> -    mem = aarch64_destructure_load_pair (regs, PATTERN (rti));
> >> -  else
> >> -    mem = aarch64_destructure_store_pair (regs, PATTERN (rti));
> >> +
> >> +  rtx mem = destructure_pair (regs, PATTERN (insn->rtl ()), load_p);
> >>    gcc_checking_assert (MEM_P (mem));
> >>  
> >>    poly_int64 offset;
> >> @@ -3112,9 +3310,10 @@ try_promote_writeback (insn_info *insn)
> >>    def_info *add_def;
> >>    const insn_range_info pair_range (insn);
> >>    insn_info *insns[2] = { nullptr, insn };
> >> -  insn_info *trailing_add = find_trailing_add (insns, pair_range, 0, 
> >> &wb_effect,
> >> -                                         &add_def, base_def, offset,
> >> -                                         access_size);
> >> +  insn_info *trailing_add
> >> +    = find_trailing_add (insns, pair_range, 0, &wb_effect,
> >> +                   &add_def, base_def, offset,
> >> +                   access_size);
> >>    if (!trailing_add)
> >>      return;
> >>  
> >> @@ -3124,8 +3323,9 @@ try_promote_writeback (insn_info *insn)
> >>    insn_change del_change (trailing_add, insn_change::DELETE);
> >>    insn_change *changes[] = { &pair_change, &del_change };
> >>  
> >> -  rtx pair_pat = aarch64_gen_writeback_pair (wb_effect, mem, regs, 
> >> load_p);
> >> -  validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
> >> +  rtx pair_pat = gen_promote_writeback_pair (wb_effect, mem, regs, 
> >> load_p);
> >> +  validate_unshare_change (insn->rtl (), &PATTERN (insn->rtl ()), 
> >> pair_pat,
> >> +                     true);
> >>  
> >>    // The pair must gain the def of the base register from the add.
> >>    pair_change.new_defs = insert_access (attempt,
> >> @@ -3159,14 +3359,12 @@ try_promote_writeback (insn_info *insn)
> >>  // for load/store candidates.  If running after RA, also try and promote
> >>  // non-writeback pairs to use writeback addressing.  Then try to fuse
> >>  // candidates into pairs.
> >> -void ldp_fusion_bb (bb_info *bb)
> >> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
> >>  {
> >> -  const bool track_loads
> >> -    = aarch64_tune_params.ldp_policy_model != 
> >> AARCH64_LDP_STP_POLICY_NEVER;
> >> -  const bool track_stores
> >> -    = aarch64_tune_params.stp_policy_model != 
> >> AARCH64_LDP_STP_POLICY_NEVER;
> >> +  const bool track_loads = track_loads_p ();
> >> +  const bool track_stores = track_stores_p ();
> >>  
> >> -  ldp_bb_info bb_state (bb);
> >> +  ldp_bb_info bb_state (bb, this);
> >>  
> >>    for (auto insn : bb->nondebug_insns ())
> >>      {
> >> @@ -3176,11 +3374,11 @@ void ldp_fusion_bb (bb_info *bb)
> >>    continue;
> >>  
> >>        rtx pat = PATTERN (rti);
> >> +      bool load_p;
> >>        if (reload_completed
> >> -    && aarch64_ldp_writeback > 1
> >> -    && GET_CODE (pat) == PARALLEL
> >> -    && XVECLEN (pat, 0) == 2)
> >> -  try_promote_writeback (insn);
> >> +    && handle_writeback_opportunities (writeback::ALL)
> >> +    && pair_mem_insn_p (rti, load_p))
> >> +  try_promote_writeback (insn, load_p);
> >>  
> >>        if (GET_CODE (pat) != SET)
> >>    continue;
> >> @@ -3195,16 +3393,6 @@ void ldp_fusion_bb (bb_info *bb)
> >>    bb_state.cleanup_tombstones ();
> >>  }
> >>  
> >> -void ldp_fusion ()
> >> -{
> >> -  ldp_fusion_init ();
> >> -
> >> -  for (auto bb : crtl->ssa->bbs ())
> >> -    ldp_fusion_bb (bb);
> >> -
> >> -  ldp_fusion_destroy ();
> >> -}
> >> -
> >>  namespace {
> >>  
> >>  const pass_data pass_data_ldp_fusion =
> >> @@ -3234,14 +3422,6 @@ public:
> >>        if (!optimize || optimize_debug)
> >>    return false;
> >>  
> >> -      // If the tuning policy says never to form ldps or stps, don't run
> >> -      // the pass.
> >> -      if ((aarch64_tune_params.ldp_policy_model
> >> -     == AARCH64_LDP_STP_POLICY_NEVER)
> >> -    && (aarch64_tune_params.stp_policy_model
> >> -        == AARCH64_LDP_STP_POLICY_NEVER))
> >> -  return false;
> >> -
> >>        if (reload_completed)
> >>    return flag_aarch64_late_ldp_fusion;
> >>        else
> >> @@ -3250,7 +3430,8 @@ public:
> >>  
> >>    unsigned execute (function *) final override
> >>      {
> >> -      ldp_fusion ();
> >> +      aarch64_pair_fusion pass;
> >> +      pass.run ();
> >>        return 0;
> >>      }
> >>  };
> >> -- 
> >> 2.39.3
> >>
> > 
> > Thanks,
> > Alex

Reply via email to