On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
>
> Thanks, I was kind-of expecting you to also send the renaming patch as a
> preparatory patch as we discussed.
>
> Sorry for another meta comment, but: I think the reason that the Linaro
> CI isn't running tests on your patches is actually because you're
> sending 1/3 of a series but not sending the rest of the series.
>
> So please can you either send this as an individual preparatory patch
> (not marked as a series) or if you're going to send a series (e.g. with
> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> series when you make updates. That way the CI should test your patches,
> which would be helpful.
>
Addressed.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64: Place target independent and dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-04-06 Ajit Kumar Agarwal <aagar...@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-ldp-fusion.cc: Place target
>> independent and dependent changed code.
>
> You're going to need a proper ChangeLog eventually, but I guess there's
> no need for that right now.
>
>> ---
>> gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++++++++++++++--------
>> 1 file changed, 249 insertions(+), 122 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 22ed95eb743..cb21b514ef7 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,8 +138,122 @@ struct alt_base
>> poly_int64 offset;
>> };
>>
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> + virtual bool conflict_p (int &budget) const = 0;
>> + virtual insn_info *insn () const = 0;
>> + virtual bool valid () const = 0;
>
> Heh, looking at this made me realise there is a whitespace bug here in
> the existing code (double space after const). Sorry about that! I'll
> push an obvious fix for that.
>
>> + virtual void advance () = 0;
>> +};
>> +
>> +struct pair_fusion {
>> +
>> + pair_fusion () {};
>
> This ctor looks pointless at the moment. Perhaps instead we could put
> the contents of ldp_fusion_init in here and then delete that function?
>
Addressed.
>> + virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> + bool load_p) = 0;
>
> Please can we have comments above each of these virtual functions
> describing any parameters, what the purpose of the hook is, and the
> interpretation of the return value? This will serve as the
> documentation for other targets that want to make use of the pass.
>
> It might make sense to have a default-false implementation for
> fpsimd_op_p, especially if you don't want to make use of this bit for
> rs6000.
>
Addressed.
>> +
>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> + virtual bool pair_trailing_writeback_p () = 0;
>
> Sorry for the run-around, but: I think this and
> handle_writeback_opportunities () should be the same function, either
> returning an enum or taking an enum and returning a boolean.
>
> At a minimum they should have more similar sounding names.
>
Addressed.
>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> + machine_mode mem_mode) = 0;
>> + virtual int pair_mem_alias_check_limit () = 0;
>> + virtual bool handle_writeback_opportunities () = 0 ;
>> + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> + machine_mode mode) = 0;
>> + virtual rtx gen_mem_pair (rtx *pats, rtx writeback,
>
> Nit: excess whitespace after pats,
>
>> + bool load_p) = 0;
>> + virtual bool pair_mem_promote_writeback_p (rtx pat) = 0;
>> + virtual bool track_load_p () = 0;
>> + virtual bool track_store_p () = 0;
>
> I think it would probably make more sense for these two to have
> default-true implementations rather than being pure virtual functions.
>
> Also, sorry for the bikeshedding, but please can we keep the plural
> names (so track_loads_p and track_stores_p).
>
>> + virtual bool cand_insns_empty_p (std::list<insn_info *> &insns) = 0;
>
> Why does this need to be virtualised? I would it expect it to
> just be insns.empty () on all targets.
>
>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT off) = 0;
>> + void ldp_fusion_bb (bb_info *bb);
>
> Just to check, are you planning to rename this function in a separate
> patch?
>
Yes.
>> +
>> + ~pair_fusion() { }
>
> As with the ctor, perhaps we could put the contents of ldp_fusion_destroy
> in here and then delete that function?
>
Addressed.
>> +};
>> +
>> +struct aarch64_pair_fusion : public pair_fusion
>> +{
>> +public:
>
Addressed.
> Nit: there shouldn't be a need for this as public is already the default
> for structs.
>
>> + aarch64_pair_fusion () : pair_fusion () {};
>
> This ctor looks pointless, the default one should do, so can we remove
> this?
>
Addressed.
>> + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> + bool load_p) override final
>> + {
>> + const bool fpsimd_op_p
>> + = reload_completed
>> + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> + : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> + && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> + return fpsimd_op_p;
>
> Can this just directly return the boolean expression instead of storing
> it into a variable first?
>
Addressed.
>> + }
>> +
>> + bool pair_mem_promote_writeback_p (rtx pat)
>> + {
>> + if (reload_completed
>> + && aarch64_ldp_writeback > 1
>> + && GET_CODE (pat) == PARALLEL
>> + && XVECLEN (pat, 0) == 2)
>> + return true;
>> +
>> + return false;
>> + }
>> +
>> + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> + machine_mode mode)
>> + {
>> + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> + load_p,
>> + mode);
>> + }
>> + bool pair_operand_mode_ok_p (machine_mode mode);
>
> Why have you implemented this (one-line function) out-of-line but the
> others are implemented inline? Please be consistent.
>
Addressed.
>> +
>> + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p);
>> +
>> + bool pair_trailing_writeback_p ()
>> + {
>> + return aarch64_ldp_writeback > 1;
>> + }
>> + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> + machine_mode mem_mode)
>> + {
>> + return (load_p
>> + ? aarch64_ldp_reg_operand (reg_op, mem_mode)
>> + : aarch64_stp_reg_operand (reg_op, mem_mode));
>> + }
>> + int pair_mem_alias_check_limit ()
>> + {
>> + return aarch64_ldp_alias_check_limit;
>> + }
>> + bool handle_writeback_opportunities ()
>> + {
>> + return aarch64_ldp_writeback;
>> + }
>> + bool track_load_p ()
>> + {
>> + const bool track_loads
>> + = aarch64_tune_params.ldp_policy_model !=
>> AARCH64_LDP_STP_POLICY_NEVER;
>> + return track_loads;
>> + }
>
Addressed.
> As above, can this just return the boolean expression directly without
> the temporary?
>
>> + bool track_store_p ()
>> + {
>> + const bool track_stores
>> + = aarch64_tune_params.stp_policy_model !=
>> AARCH64_LDP_STP_POLICY_NEVER;
>> + return track_stores;
>> + }
>
Addressed.
> Likewise.
>
>> + bool cand_insns_empty_p (std::list<insn_info *> &insns)
>> + {
>> + return insns.empty();
>> + }
>
> As mentioned on the declaration, I don't see why this needs to be
> virtualised.
>
>> + bool pair_mem_in_range_p (HOST_WIDE_INT off)
>> + {
>> + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM);
>> + }
>
> Hmm, this looks to test exactly the opposite condition from what the
> name suggests, i.e. the implementation looks like what I would expect
> for a function called pair_mem_out_of_range_p.
>
> Please can you invert the condition and fix up callers appropriately?
>
> This suggests you're not really thinking about the code you're writing,
> please try to think a bit more carefully before posting patches.
>
Sorry for my mistake as I have overlooked it.
>> +};
>> +
>> +
>> // State used by the pass for a given basic block.
>> -struct ldp_bb_info
>> +struct pair_fusion_bb_info
>> {
>> using def_hash = nofree_ptr_hash<def_info>;
>> using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>;
>> @@ -160,14 +274,17 @@ struct ldp_bb_info
>>
>> static const size_t obstack_alignment = sizeof (void *);
>> bb_info *m_bb;
>> + pair_fusion *bb_state;
>
Addressed.
> Can we call this m_pass instead of bb_state? The pair_fusion class is
> explicitly at the pass level rather than the bb level.
>
> Also I think this (and indeed m_bb, although that's pre-existing) could
> be private members. Not sure there's a good reason for making them
> public.
>
Addressed.
> It might be slightly nicer to use a reference for m_pass unless there's
> a good reason to use a pointer, although I don't feel strongly about
> this.
>
>>
>> - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>> + pair_fusion_bb_info (bb_info *bb,
>> + aarch64_pair_fusion *d) : m_bb (bb),
>
> This generic class needs to take the generic pass type, i.e. just a
> pair_fusion * rather than an aarch64_pair_fusion *.
>
>> + bb_state (d), m_emitted_tombstone (false)
>> {
>> obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>> obstack_alignment, obstack_chunk_alloc,
>> obstack_chunk_free);
>> }
>> - ~ldp_bb_info ()
>> + ~pair_fusion_bb_info ()
>> {
>> obstack_free (&m_obstack, nullptr);
>>
>> @@ -177,10 +294,32 @@ struct ldp_bb_info
>> bitmap_obstack_release (&m_bitmap_obstack);
>> }
>> }
>> + void track_access (insn_info *, bool load, rtx mem);
>> + void transform ();
>> + void cleanup_tombstones ();
>
> I think most (or indeed all?) of the changes here shouldn't be needed.
> AFAICS, there shouldn't be a need to:
> - drop inline from these functions.
> - change the accessibility of any members.
>
Addressed.
> Plese can you drop these changes or explain why they're needed (from
> here until the end of the class).
>
>> + void merge_pairs (insn_list_t &, insn_list_t &,
>> + bool load_p, unsigned access_size);
>> + void transform_for_base (int load_size, access_group &group);
>> +
>> + bool try_fuse_pair (bool load_p, unsigned access_size,
>> + insn_info *i1, insn_info *i2);
>> +
>> + bool fuse_pair (bool load_p, unsigned access_size,
>> + int writeback,
>> + insn_info *i1, insn_info *i2,
>> + base_cand &base,
>> + const insn_range_info &move_range);
>>
>> - inline void track_access (insn_info *, bool load, rtx mem);
>> - inline void transform ();
>> - inline void cleanup_tombstones ();
>> + void do_alias_analysis (insn_info *alias_hazards[4],
>> + alias_walker *walkers[4],
>> + bool load_p);
>> +
>> + void track_tombstone (int uid);
>> +
>> + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> +
>> + template<typename Map>
>> + void traverse_base_map (Map &map);
>>
>> private:
>> obstack m_obstack;
>> @@ -191,30 +330,32 @@ private:
>> bool m_emitted_tombstone;
>>
>> inline splay_tree_node<access_record *> *node_alloc (access_record *);
>> +};
>>
>> - template<typename Map>
>> - inline void traverse_base_map (Map &map);
>> - inline void transform_for_base (int load_size, access_group &group);
>> -
>> - inline void merge_pairs (insn_list_t &, insn_list_t &,
>> - bool load_p, unsigned access_size);
>> -
>> - inline bool try_fuse_pair (bool load_p, unsigned access_size,
>> - insn_info *i1, insn_info *i2);
>> -
>> - inline bool fuse_pair (bool load_p, unsigned access_size,
>> - int writeback,
>> - insn_info *i1, insn_info *i2,
>> - base_cand &base,
>> - const insn_range_info &move_range);
>> -
>> - inline void track_tombstone (int uid);
>> +rtx aarch64_pair_fusion::gen_mem_pair (rtx *pats,
>
> Style nit: newline after rtx.
>
Addressed.
>> + rtx writeback,
>> + bool load_p)
>> + {
>> + rtx pair_pat;
>>
>> - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>> -};
>> + if (writeback)
>> + {
>> + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]);
>> + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
>> + }
>> + else if (load_p)
>> + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
>> + XEXP (pats[1], 0),
>> + XEXP (pats[0], 1));
>> + else
>> + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
>> + XEXP (pats[0], 1),
>> + XEXP (pats[1], 1));
>> + return pair_pat;
>
> Can the assignments just be direct returns here, please? Then
> get rid of the temporary.
>
Addressed.
>> + }
>>
>> splay_tree_node<access_record *> *
>> -ldp_bb_info::node_alloc (access_record *access)
>> +pair_fusion_bb_info::node_alloc (access_record *access)
>> {
>> using T = splay_tree_node<access_record *>;
>> void *addr = obstack_alloc (&m_obstack, sizeof (T));
>> @@ -262,7 +403,7 @@ drop_writeback (rtx mem)
>> // RTX_AUTOINC addresses. The interface is like strip_offset except we
>> take a
>> // MEM so that we know the mode of the access.
>> static rtx
>> -ldp_strip_offset (rtx mem, poly_int64 *offset)
>> +pair_mem_strip_offset (rtx mem, poly_int64 *offset)
>
> I thought we discussed splitting the renaming into a separate patch?
>
>> {
>> rtx addr = XEXP (mem, 0);
>>
>> @@ -332,6 +473,12 @@ ldp_operand_mode_ok_p (machine_mode mode)
>> return reload_completed || mode != TImode;
>> }
>>
>> +bool
>> +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode)
>> +{
>> + return ldp_operand_mode_ok_p (mode);
>> +}
>> +
>> // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these
>> // into an integer for use as a hash table key.
>> static int
>> @@ -396,7 +543,7 @@ access_group::track (Alloc alloc_node, poly_int64
>> offset, insn_info *insn)
>> // MEM_EXPR base (i.e. a tree decl) relative to which we can track the
>> access.
>> // LFS is used as part of the key to the hash table, see track_access.
>> bool
>> -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
>> +pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem,
>> lfs_fields lfs)
>
> Again, please do any renaming in a separate patch.
>
Addressed.
>> {
>> if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem))
>> return false;
>> @@ -412,9 +559,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx
>> mem, lfs_fields lfs)
>> const machine_mode mem_mode = GET_MODE (mem);
>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>>
>> - // Punt on misaligned offsets. LDP/STP instructions require offsets to
>> be a
>> - // multiple of the access size, and we believe that misaligned offsets on
>> - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL
>> bases.
>> + // Punt on misaligned offsets. Paired memory access instructions require
>> + // offsets to be a multiple of the access size, and we believe that
>> + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned
>> + // offsets w.r.t. RTL bases.
>> if (!multiple_p (offset, mem_size))
>> return false;
>>
>> @@ -438,46 +586,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx
>> mem, lfs_fields lfs)
>> }
>>
>> // Main function to begin pair discovery. Given a memory access INSN,
>> -// determine whether it could be a candidate for fusing into an ldp/stp,
>> +// determine whether it could be a candidate for fusing into an pair mem,
>
> s/an pair mem/a paired access/
>
>> // and if so, track it in the appropriate data structure for this basic
>> // block. LOAD_P is true if the access is a load, and MEM is the mem
>> // rtx that occurs in INSN.
>> void
>> -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>> +pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
>> {
>> // We can't combine volatile MEMs, so punt on these.
>> if (MEM_VOLATILE_P (mem))
>> return;
>>
>> - // Ignore writeback accesses if the param says to do so.
>> - if (!aarch64_ldp_writeback
>> + // Ignore writeback accesses if the param says to do so
>
Addressed.
> This comment needs updating, I guess something like "if the hook says to
> do so". But also please don't delete the full stop.
>
>> + if (!bb_state->handle_writeback_opportunities ()
>> && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
>> return;
>>
>> const machine_mode mem_mode = GET_MODE (mem);
>> - if (!ldp_operand_mode_ok_p (mem_mode))
>> +
>
> I don't think we need the extra whitespace here.
>
>> + if (!bb_state->pair_operand_mode_ok_p (mem_mode))
>> return;
>>
>> rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
>>
>> - // Ignore the access if the register operand isn't suitable for ldp/stp.
>> - if (load_p
>> - ? !aarch64_ldp_reg_operand (reg_op, mem_mode)
>> - : !aarch64_stp_reg_operand (reg_op, mem_mode))
>> + if (!bb_state->pair_reg_operand_ok_p (load_p, reg_op, mem_mode))
>> return;
>> -
>> // We want to segregate FP/SIMD accesses from GPR accesses.
>> //
>> // Before RA, we use the modes, noting that stores of constant zero
>> // operands use GPRs (even in non-integer modes). After RA, we use
>> // the hard register numbers.
>> - const bool fpsimd_op_p
>> - = reload_completed
>> - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
>> - : (GET_MODE_CLASS (mem_mode) != MODE_INT
>> - && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
>> -
>> - // Note ldp_operand_mode_ok_p already rejected VL modes.
>> + const bool fpsimd_op_p = bb_state->fpsimd_op_p (reg_op, mem_mode, load_p);
>> + // Note pair_operand_mode_ok_p already rejected VL modes.
>> const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
>> const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
>>
>> @@ -487,7 +627,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p,
>> rtx mem)
>> poly_int64 mem_off;
>> rtx addr = XEXP (mem, 0);
>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
>> - rtx base = ldp_strip_offset (mem, &mem_off);
>> + rtx base = pair_mem_strip_offset (mem, &mem_off);
>
> Again, let's split the renaming off into a separate patch.
>
Addressed.
>> if (!REG_P (base))
>> return;
>>
>> @@ -506,8 +646,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p,
>> rtx mem)
>> // elimination offset pre-RA, we should postpone forming pairs on such
>> // accesses until after RA.
>> //
>> - // As it stands, addresses with offsets in range for LDR but not
>> - // in range for LDP/STP are currently reloaded inefficiently,
>> + // As it stands, addresses in range for an individual load/store but not
>> + // for a paired access are currently reloaded inefficiently,
>> // ending up with a separate base register for each pair.
>> //
>> // In theory LRA should make use of
>> @@ -519,8 +659,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p,
>> rtx mem)
>> // that calls targetm.legitimize_address_displacement.
>> //
>> // So for now, it's better to punt when we can't be sure that the
>> - // offset is in range for LDP/STP. Out-of-range cases can then be
>> - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it
>> + // offset is in range for paired access. Out-of-range cases can then be
>> + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually,
>> it
>> // would be nice to handle known out-of-range opportunities in the
>> // pass itself (for stack accesses, this would be in the post-RA pass).
>> if (!reload_completed
>> @@ -573,8 +713,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p,
>> rtx mem)
>> gcc_unreachable (); // Base defs should be unique.
>> }
>>
>> - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple
>> of
>> - // the access size.
>> + // Punt on misaligned offsets. Paired memory accesses require offsets
>> + // to be a multiple of the access size.
>> if (!multiple_p (mem_off, mem_size))
>> return;
>>
>> @@ -614,7 +754,7 @@ static bool no_ignore (insn_info *) { return false; }
>> // making use of alias disambiguation.
>> static insn_info *
>> latest_hazard_before (insn_info *insn, rtx *ignore,
>> - insn_info *ignore_insn = nullptr)
>> + insn_info *ignore_insn = 0)
>
> Is this change needed?
>
Addressed.
>> {
>> insn_info *result = nullptr;
>>
>> @@ -1150,7 +1290,7 @@ extract_writebacks (bool load_p, rtx pats[2], int
>> changed)
>> const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
>>
>> poly_int64 offset;
>> - rtx this_base = ldp_strip_offset (mem, &offset);
>> + rtx this_base = pair_mem_strip_offset (mem, &offset);
>> gcc_assert (REG_P (this_base));
>> if (base_reg)
>> gcc_assert (rtx_equal_p (base_reg, this_base));
>> @@ -1286,7 +1426,11 @@ find_trailing_add (insn_info *insns[2],
>>
>> off_hwi /= access_size;
>>
>> - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM)
>> + pair_fusion *pfuse;
>> + aarch64_pair_fusion derived;
>> + pfuse = &derived;
>
> Huh? We'll want to use the same instance of the pass structure (pair_fusion)
> that gets created at the top level, so that should get passed around
> everywhere.
>
> In the case of find_trailing_add we probably don't want to add any more
> parameters (as Segher noted there are already too many), so perhaps the
> easiest thing would be to make find_trailing_add itself a (non-virtual)
> member function of pair_fusion.
>
> Then here we can just query pair_mem_in_range_p directly.
>
> For try_promote_writeback itself you can either make it a non-virtual
> member function of pair_fusion or just pass a reference to the pass
> object in when you call it.
>
> Note also that you don't have to create a pointer of the parent type in
> order to call a virtual function on a derived object, you can just call
> it directly.
>
Addressed.
>> +
>> + if (pfuse->pair_mem_in_range_p (off_hwi))
>> return nullptr;
>
> Note of course this condition needs inverting when you invert the
> implementation.
>
Addressed.
>>
>> auto dump_prefix = [&]()
>> @@ -1328,7 +1472,7 @@ find_trailing_add (insn_info *insns[2],
>> // We just emitted a tombstone with uid UID, track it in a bitmap for
>> // this BB so we can easily identify it later when cleaning up tombstones.
>> void
>> -ldp_bb_info::track_tombstone (int uid)
>> +pair_fusion_bb_info::track_tombstone (int uid)
>
> Again, please split the renaming off. Same with other instances below.
>
>> {
>> if (!m_emitted_tombstone)
>> {
>> @@ -1528,7 +1672,7 @@ fixup_debug_uses (obstack_watermark &attempt,
>> gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0)))
>> == RTX_AUTOINC);
>>
>> - base = ldp_strip_offset (mem, &offset);
>> + base = pair_mem_strip_offset (mem, &offset);
>> gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno);
>> }
>> fixup_debug_use (attempt, use, def, base, offset);
>> @@ -1664,7 +1808,7 @@ fixup_debug_uses (obstack_watermark &attempt,
>> // BASE gives the chosen base candidate for the pair and MOVE_RANGE is
>> // a singleton range which says where to place the pair.
>> bool
>> -ldp_bb_info::fuse_pair (bool load_p,
>> +pair_fusion_bb_info::fuse_pair (bool load_p,
>> unsigned access_size,
>> int writeback,
>> insn_info *i1, insn_info *i2,
>> @@ -1800,7 +1944,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>> {
>> if (dump_file)
>> fprintf (dump_file,
>> - " ldp: i%d has wb but subsequent i%d has non-wb "
>> + " load pair: i%d has wb but subsequent i%d has non-wb "
>> "update of base (r%d), dropping wb\n",
>> insns[0]->uid (), insns[1]->uid (), base_regno);
>> gcc_assert (writeback_effect);
>> @@ -1823,7 +1967,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>> }
>>
>> // If either of the original insns had writeback, but the resulting pair
>> insn
>> - // does not (can happen e.g. in the ldp edge case above, or if the
>> writeback
>> + // does not (can happen e.g. in the load pair edge case above, or if the
>> writeback
>> // effects cancel out), then drop the def(s) of the base register as
>> // appropriate.
>> //
>> @@ -1842,7 +1986,7 @@ ldp_bb_info::fuse_pair (bool load_p,
>> // update of the base register and try and fold it in to make this into a
>> // writeback pair.
>> insn_info *trailing_add = nullptr;
>> - if (aarch64_ldp_writeback > 1
>> + if (bb_state->pair_trailing_writeback_p ()
>> && !writeback_effect
>> && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
>> XEXP (pats[0], 0), nullptr)
>> @@ -1863,14 +2007,14 @@ ldp_bb_info::fuse_pair (bool load_p,
>> }
>>
>> // Now that we know what base mem we're going to use, check if it's OK
>> - // with the ldp/stp policy.
>> + // with the pair mem policy.
>> rtx first_mem = XEXP (pats[0], load_p);
>> - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem,
>> - load_p,
>> - GET_MODE (first_mem)))
>> + if (!bb_state->pair_mem_ok_with_policy (first_mem,
>> + load_p,
>> + GET_MODE (first_mem)))
>> {
>> if (dump_file)
>> - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n",
>> + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says
>> no\n",
>> i1->uid (), i2->uid ());
>> return false;
>> }
>> @@ -1878,21 +2022,10 @@ ldp_bb_info::fuse_pair (bool load_p,
>> rtx reg_notes = combine_reg_notes (first, second, load_p);
>>
>> rtx pair_pat;
>> - if (writeback_effect)
>> - {
>> - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]);
>> - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec);
>> - }
>> - else if (load_p)
>> - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0),
>> - XEXP (pats[1], 0),
>> - XEXP (pats[0], 1));
>> - else
>> - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0),
>> - XEXP (pats[0], 1),
>> - XEXP (pats[1], 1));
>>
>> + pair_pat = bb_state->gen_mem_pair (pats, writeback_effect, load_p);
>> insn_change *pair_change = nullptr;
>> +
>> auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) {
>> rtx_insn *rti = change->insn ()->rtl ();
>> validate_unshare_change (rti, &PATTERN (rti), pair_pat, true);
>> @@ -2133,15 +2266,6 @@ load_modified_by_store_p (insn_info *load,
>> return false;
>> }
>>
>> -// Virtual base class for load/store walkers used in alias analysis.
>> -struct alias_walker
>> -{
>> - virtual bool conflict_p (int &budget) const = 0;
>> - virtual insn_info *insn () const = 0;
>> - virtual bool valid () const = 0;
>> - virtual void advance () = 0;
>> -};
>> -
>> // Implement some common functionality used by both store_walker
>> // and load_walker.
>> template<bool reverse>
>> @@ -2259,13 +2383,13 @@ public:
>> //
>> // We try to maintain the invariant that if a walker becomes invalid, we
>> // set its pointer to null.
>> -static void
>> -do_alias_analysis (insn_info *alias_hazards[4],
>> +void
>> +pair_fusion_bb_info::do_alias_analysis (insn_info *alias_hazards[4],
>
> It would probably make more sense for this to be a (non-virtual) member
> function of the pair_fusion class, since it doesn't need to access the
> BB-specific state.
>
Addressed.
>> alias_walker *walkers[4],
>> bool load_p)
>> {
>> const int n_walkers = 2 + (2 * !load_p);
>> - int budget = aarch64_ldp_alias_check_limit;
>> + int budget = bb_state->pair_mem_alias_check_limit ();
>>
>> auto next_walker = [walkers,n_walkers](int current) -> int {
>> for (int j = 1; j <= n_walkers; j++)
>> @@ -2365,7 +2489,7 @@ get_viable_bases (insn_info *insns[2],
>> {
>> const bool is_lower = (i == reversed);
>> poly_int64 poly_off;
>> - rtx base = ldp_strip_offset (cand_mems[i], &poly_off);
>> + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off);
>> if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC)
>> writeback |= (1 << i);
>>
>> @@ -2373,7 +2497,7 @@ get_viable_bases (insn_info *insns[2],
>> continue;
>>
>> // Punt on accesses relative to eliminable regs. See the comment in
>> - // ldp_bb_info::track_access for a detailed explanation of this.
>> + // pair_fusion_bb_info::track_access for a detailed explanation of
>> this.
>> if (!reload_completed
>> && (REGNO (base) == FRAME_POINTER_REGNUM
>> || REGNO (base) == ARG_POINTER_REGNUM))
>> @@ -2397,7 +2521,11 @@ get_viable_bases (insn_info *insns[2],
>> if (!is_lower)
>> base_off--;
>>
>> - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM)
>> + pair_fusion *pfuse;
>> + aarch64_pair_fusion derived;
>> + pfuse = &derived;
>
> Again, please don't do this but instead use the same instance
> everywhere.
>
Addressed.
>> +
>> + if (pfuse->pair_mem_in_range_p (base_off))
>> continue;
>>
>> use_info *use = find_access (insns[i]->uses (), REGNO (base));
>> @@ -2454,12 +2582,12 @@ get_viable_bases (insn_info *insns[2],
>> }
>>
>> // Given two adjacent memory accesses of the same size, I1 and I2, try
>> -// and see if we can merge them into a ldp or stp.
>> +// and see if we can merge them into a paired accesses load and store.
>
> "into a load pair or store pair" or "into a paired access" if you
> prefer.
>
Addressed.
>> //
>> // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
>> // if the accesses are both loads, otherwise they are both stores.
>> bool
>> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>> +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>> insn_info *i1, insn_info *i2)
>> {
>> if (dump_file)
>> @@ -2494,7 +2622,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned
>> access_size,
>> {
>> if (dump_file)
>> fprintf (dump_file,
>> - "punting on ldp due to reg conflcits (%d,%d)\n",
>> + "punting on pair mem load due to reg conflcits (%d,%d)\n",
>> insns[0]->uid (), insns[1]->uid ());
>> return false;
>> }
>> @@ -2843,7 +2971,7 @@ debug (const insn_list_t &l)
>> // we can't re-order them anyway, so provided earlier passes have cleaned up
>> // redundant loads, we shouldn't miss opportunities by doing this.
>> void
>> -ldp_bb_info::merge_pairs (insn_list_t &left_list,
>> +pair_fusion_bb_info::merge_pairs (insn_list_t &left_list,
>> insn_list_t &right_list,
>> bool load_p,
>> unsigned access_size)
>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list,
>> // of accesses. If we find two sets of adjacent accesses, call
>> // merge_pairs.
>> void
>> -ldp_bb_info::transform_for_base (int encoded_lfs,
>> - access_group &group)
>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>> + access_group &group)
>> {
>> const auto lfs = decode_lfs (encoded_lfs);
>> const unsigned access_size = lfs.size;
>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
>> access.cand_insns,
>> lfs.load_p,
>> access_size);
>> - skip_next = access.cand_insns.empty ();
>> + skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
>
> As above, why is this needed?
For rs6000 we want to return always true. as load store pair
that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
And we want load store pair to 8/16 32/64. Thats why we want
to generate always true for rs6000 to skip pairs as above.
>
>> }
>> prev_access = &access;
>> }
>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs,
>> // and remove all the tombstone insns, being sure to reparent any uses
>> // of mem to previous defs when we do this.
>> void
>> -ldp_bb_info::cleanup_tombstones ()
>> +pair_fusion_bb_info::cleanup_tombstones ()
>> {
>> // No need to do anything if we didn't emit a tombstone insn for this BB.
>> if (!m_emitted_tombstone)
>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones ()
>>
>> template<typename Map>
>> void
>> -ldp_bb_info::traverse_base_map (Map &map)
>> +pair_fusion_bb_info::traverse_base_map (Map &map)
>> {
>> for (auto kv : map)
>> {
>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map)
>> }
>>
>> void
>> -ldp_bb_info::transform ()
>> +pair_fusion_bb_info::transform ()
>> {
>> traverse_base_map (expr_map);
>> traverse_base_map (def_map);
>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn)
>> // for load/store candidates. If running after RA, also try and promote
>> // non-writeback pairs to use writeback addressing. Then try to fuse
>> // candidates into pairs.
>> -void ldp_fusion_bb (bb_info *bb)
>> +void pair_fusion::ldp_fusion_bb (bb_info *bb)
>> {
>> - const bool track_loads
>> - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> - const bool track_stores
>> - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER;
>> + const bool track_loads = track_load_p ();
>> + const bool track_stores = track_store_p ();
>>
>> - ldp_bb_info bb_state (bb);
>
> This:
>
>> + aarch64_pair_fusion derived;
>
> can be deleted and then:
>
>> + pair_fusion_bb_info bb_info (bb, &derived);
>
> can just be:
>
> pair_fusion_bb_info bb_info (bb, this);
>
> (or you can pass *this if you make bb_info take a reference).
>
> I don't think there's a particular need to change the variable name
> (bb_state -> bb_info). I chose the former because it doens't clash
> with the RTL-SSA structure of the same name as the latter.
>
Addressed.
>>
>> for (auto insn : bb->nondebug_insns ())
>> {
>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb)
>> continue;
>>
>> rtx pat = PATTERN (rti);
>> - if (reload_completed
>> - && aarch64_ldp_writeback > 1
>> - && GET_CODE (pat) == PARALLEL
>> - && XVECLEN (pat, 0) == 2)
>> + if (pair_mem_promote_writeback_p (pat))
>> try_promote_writeback (insn);
>
> It looks like try_promote_writeback itself will need some further work
> to make it target-independent. I suppose this check:
>
> auto rti = insn->rtl ();
> const auto attr = get_attr_ldpstp (rti);
> if (attr == LDPSTP_NONE)
> return;
>
> bool load_p = (attr == LDPSTP_LDP);
> gcc_checking_assert (load_p || attr == LDPSTP_STP);
>
> will need to become part of the pair_mem_promote_writeback_p hook that you
> added, potentially changing it to return a boolean for load_p.
>
> Then I guess we will need hooks for destructuring the pair insn and
> another hook to wrap aarch64_gen_writeback_pair.
>
Addressed.
>>
>> if (GET_CODE (pat) != SET)
>> continue;
>>
>> if (track_stores && MEM_P (XEXP (pat, 0)))
>> - bb_state.track_access (insn, false, XEXP (pat, 0));
>> + bb_info.track_access (insn, false, XEXP (pat, 0));
>> else if (track_loads && MEM_P (XEXP (pat, 1)))
>> - bb_state.track_access (insn, true, XEXP (pat, 1));
>> + bb_info.track_access (insn, true, XEXP (pat, 1));
>> }
>>
>> - bb_state.transform ();
>> - bb_state.cleanup_tombstones ();
>> + bb_info.transform ();
>> + bb_info.cleanup_tombstones ();
>> }
>>
>> void ldp_fusion ()
>> {
>> ldp_fusion_init ();
>> + pair_fusion *pfuse;
>> + aarch64_pair_fusion derived;
>> + pfuse = &derived;
>
> This is indeed the one place where I think it is acceptable to
> instantiate aarch64_pair_fusion. But again there's no need to create a
> pointer to the parent class, just call any function you like directly.
>
Addressed.
>>
>> for (auto bb : crtl->ssa->bbs ())
>> - ldp_fusion_bb (bb);
>> + pfuse->ldp_fusion_bb (bb);
>
> I think even the code to iterate over bbs should itself be a member
> function of pair_fusion (say "run") and then that becomes part of the
> generic code.
>
> So this function would just become:
>
> aarch64_pair_fusion pass;
> pass.run ();
>
> and could be inlined into the caller.
>
Addressed.
> Perhaps you could also add an early return like:
>
> if (!track_loads_p () && !track_stores_p ())
> return;
>
> in pair_fusion::run () and then remove the corresponding code from
> pass_ldp_fusion::gate?
>
Addressed.
>>
>> ldp_fusion_destroy ();
>> }
>> --
>> 2.39.3
>>
>
> Thanks,
> Alex
Thanks & Regards
Ajit