Hello Alex: On 22/05/24 3:30 pm, Alex Coplan wrote: > Hi Ajit, > > You need to remove the header dependencies that are no longer required > for aarch64-ldp-fusion.o in t-aarch64 (not forgetting to update the > ChangeLog). A few other minor nits below. > > LGTM with those changes, but you'll need Richard S to approve. > > Thanks a lot for doing this. > > On 22/05/2024 00:16, Ajit Agarwal wrote: >> Hello Alex/Richard: >> >> All comments are addressed. >> >> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >> to support multiple targets. >> >> Common infrastructure of load store pair fusion is divided into >> target independent and target dependent code. >> >> Target independent code is structured in the following files. >> gcc/pair-fusion.h >> gcc/pair-fusion.cc >> >> Target independent code is the Generic code with pure virtual >> function to interface betwwen target independent and dependent >> code. >> >> Bootstrapped and regtested on aarch64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> >> >> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end >> >> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >> to support multiple targets. >> >> Common infrastructure of load store pair fusion is divided into >> target independent and target dependent code. >> >> Target independent code is structured in the following files. >> gcc/pair-fusion.h >> gcc/pair-fusion.cc >> >> Target independent code is the Generic code with pure virtual >> function to interface betwwen target independent and dependent >> code. >> >> 2024-05-22 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >> >> gcc/ChangeLog: >> >> * pair-fusion.h: Generic header code for load store pair fusion >> that can be shared across different architectures. >> * pair-fusion.cc: Generic source code implementation for >> load store pair fusion that can be shared across different >> architectures. >> * Makefile.in: Add new object file pair-fusion.o. >> * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it >> to pair-fusion.cc in the middle-end. >> * config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h. >> --- >> gcc/Makefile.in | 1 + >> gcc/config/aarch64/aarch64-ldp-fusion.cc | 3298 +--------------------- >> gcc/config/aarch64/t-aarch64 | 2 +- >> gcc/pair-fusion.cc | 3013 ++++++++++++++++++++ >> gcc/pair-fusion.h | 193 ++ >> 5 files changed, 3286 insertions(+), 3221 deletions(-) >> create mode 100644 gcc/pair-fusion.cc >> create mode 100644 gcc/pair-fusion.h >> >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >> index a7f15694c34..643342f623d 100644 >> --- a/gcc/Makefile.in >> +++ b/gcc/Makefile.in >> @@ -1563,6 +1563,7 @@ OBJS = \ >> ipa-strub.o \ >> ipa.o \ >> ira.o \ >> + pair-fusion.o \ >> ira-build.o \ >> ira-costs.o \ >> ira-conflicts.o \ >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc >> index 085366cdf68..0af927231d3 100644 >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc > <snip> >> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 >> index 78713558e7d..bdada08be70 100644 >> --- a/gcc/config/aarch64/t-aarch64 >> +++ b/gcc/config/aarch64/t-aarch64 >> @@ -203,7 +203,7 @@ aarch64-early-ra.o: >> $(srcdir)/config/aarch64/aarch64-early-ra.cc \ >> aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \ >> $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \ >> $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \ >> - fold-const.h tree-hash-traits.h print-tree.h >> + fold-const.h tree-hash-traits.h print-tree.h pair-fusion.h > > So now you also need to remove the deps on the includes removed in the latest > version of the patch. >
Addressed in v4 of the patch. >> $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ >> $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc >> >> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc >> new file mode 100644 >> index 00000000000..827b88cf2fc >> --- /dev/null >> +++ b/gcc/pair-fusion.cc >> @@ -0,0 +1,3013 @@ >> +// Pass to fuse adjacent loads/stores into paired memory accesses. >> +// Copyright (C) 2024 Free Software Foundation, Inc. >> +// >> +// This file is part of GCC. >> +// >> +// GCC is free software; you can redistribute it and/or modify it >> +// under the terms of the GNU General Public License as published by >> +// the Free Software Foundation; either version 3, or (at your option) >> +// any later version. >> +// >> +// GCC is distributed in the hope that it will be useful, but >> +// WITHOUT ANY WARRANTY; without even the implied warranty of >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> +// General Public License for more details. >> +// >> +// You should have received a copy of the GNU General Public License >> +// along with GCC; see the file COPYING3. If not see >> +// <http://www.gnu.org/licenses/>. >> + >> +#define INCLUDE_ALGORITHM >> +#define INCLUDE_FUNCTIONAL >> +#define INCLUDE_LIST >> +#define INCLUDE_TYPE_TRAITS >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "backend.h" >> +#include "rtl.h" >> +#include "df.h" >> +#include "rtl-iter.h" >> +#include "rtl-ssa.h" >> +#include "cfgcleanup.h" >> +#include "tree-pass.h" >> +#include "ordered-hash-map.h" >> +#include "tree-dfa.h" >> +#include "fold-const.h" >> +#include "tree-hash-traits.h" >> +#include "print-tree.h" >> + >> +using namespace rtl_ssa; >> + >> +#include "pair-fusion.h" > > Now that you've fixed up the header file, I think it would be more natural to > keep the includes together and have the using namespace come after all the > includes. > Addressed in v4 of the patch. >> + >> +// We pack these fields (load_p, fpsimd_p, and size) into an integer >> +// (LFS) which we use as part of the key into the main hash tables. >> +// >> +// The idea is that we group candidates together only if they agree on >> +// the fields below. Candidates that disagree on any of these >> +// properties shouldn't be merged together. >> +struct lfs_fields >> +{ >> + bool load_p; >> + bool fpsimd_p; >> + unsigned size; >> +}; >> + > <snip> >> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h >> new file mode 100644 >> index 00000000000..846672959a9 >> --- /dev/null >> +++ b/gcc/pair-fusion.h >> @@ -0,0 +1,193 @@ >> +// Pass to fuse adjacent loads/stores into paired memory accesses. >> +// >> +// This file contains the definition of the virtual base class which is >> +// overriden by targets that make use of the pass. > > Nit: probably want a blank line here. Addressed in v4 of the patch. > >> +// Copyright (C) 2024 Free Software Foundation, Inc. >> +// >> +// This file is part of GCC. >> +// >> +// GCC is free software; you can redistribute it and/or modify it >> +// under the terms of the GNU General Public License as published by >> +// the Free Software Foundation; either version 3, or (at your option) >> +// any later version. >> +// >> +// GCC is distributed in the hope that it will be useful, but >> +// WITHOUT ANY WARRANTY; without even the implied warranty of >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> +// General Public License for more details. >> +// >> +// You should have received a copy of the GNU General Public License >> +// along with GCC; see the file COPYING3. If not see >> +// <http://www.gnu.org/licenses/>. >> + >> +namespace rtl_ssa { >> + class def_info; >> + class insn_info; >> + class insn_range_info; >> + class bb_info; >> +} >> + >> +// Information about a potential base candidate, used in try_fuse_pair. >> +// There may be zero, one, or two viable RTL bases for a given pair. >> +struct base_cand >> +{ >> + // DEF is the def of the base register to be used by the pair. >> + rtl_ssa::def_info *def; >> + >> + // FROM_INSN is -1 if the base candidate is already shared by both >> + // candidate insns. Otherwise it holds the index of the insn from >> + // which the base originated. >> + // >> + // In the case that the base is shared, either DEF is already used >> + // by both candidate accesses, or both accesses see different versions >> + // of the same regno, in which case DEF is the def consumed by the >> + // first candidate access. >> + int from_insn; >> + >> + // To form a pair, we do so by moving the first access down and the second >> + // access up. To determine where to form the pair, and whether or not >> + // it is safe to form the pair, we track instructions which cannot be >> + // re-ordered past due to either dataflow or alias hazards. >> + // >> + // Since we allow changing the base used by an access, the choice of >> + // base can change which instructions act as re-ordering hazards for >> + // this pair (due to different dataflow). We store the initial >> + // dataflow hazards for this choice of base candidate in HAZARDS. >> + // >> + // These hazards act as re-ordering barriers to each candidate insn >> + // respectively, in program order. >> + // >> + // Later on, when we take alias analysis into account, we narrow >> + // HAZARDS accordingly. >> + rtl_ssa::insn_info *hazards[2]; >> + >> + base_cand (rtl_ssa::def_info *def, int insn) >> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} >> + >> + base_cand (rtl_ssa::def_info *def) : base_cand (def, -1) {} >> + >> + // Test if this base candidate is viable according to HAZARDS. >> + inline bool viable () const; >> +}; >> + >> +struct alias_walker; >> + >> +// When querying should_handle_writeback, this enum is used to >> +// qualify which opportunities we are asking about. >> +enum class writeback { >> + // Only those writeback opportunities that arise from existing >> + // auto-increment accesses. >> + EXISTING, >> + >> + // All writeback opportunities, including those that involve folding >> + // base register updates into a non-writeback pair. >> + ALL >> +}; >> + >> +// This class can be overriden by targets to give a pass that fuses >> +// adjacent loads and stores into load/store pair instructions. >> +// >> +// The target can override the various virtual functions to customize >> +// the behaviour of the pass as appropriate for the target. >> +struct pair_fusion { >> + pair_fusion (); >> + >> + // Given: >> + // - an rtx REG_OP, the non-memory operand in a load/store insn, >> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and >> + // - a boolean LOAD_P (true iff the insn is a load), then: >> + // return true if the access should be considered an FP/SIMD access. >> + // Such accesses are segregated from GPR accesses, since we only want >> + // to form pairs for accesses that use the same register file. >> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) >> + { >> + return false; >> + } >> + >> + // Return true if we should consider forming pairs from memory >> + // accesses with operand mode MODE at this stage in compilation. >> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >> + >> + // Return true iff REG_OP is a suitable register operand for a paired >> + // memory access, where LOAD_P is true if we're asking about loads and >> + // false for stores. MODE gives the mode of the operand. >> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >> + machine_mode mode) = 0; >> + >> + // Return alias check limit. >> + // This is needed to avoid unbounded quadratic behaviour when >> + // performing alias analysis. >> + virtual int pair_mem_alias_check_limit () = 0; >> + >> + // Return true if we should try to handle writeback opportunities. >> + // WHICH determines the kinds of writeback opportunities the caller >> + // is asking about. >> + virtual bool should_handle_writeback (enum writeback which) = 0; >> + >> + // Given BASE_MEM, the mem from the lower candidate access for a pair, >> + // and LOAD_P (true if the access is a load), check if we should proceed >> + // to form the pair given the target's code generation policy on >> + // paired accesses. >> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; >> + >> + // Generate the pattern for a paired access. PATS gives the patterns >> + // for the individual memory accesses (which by this point must share a >> + // common base register). If WRITEBACK is non-NULL, then this rtx >> + // describes the update to the base register that should be performed by >> + // the resulting insn. LOAD_P is true iff the accesses are loads. >> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; >> + >> + // Return true if INSN is a paired memory access. If so, set LOAD_P to >> + // true iff INSN is a load pair. >> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; >> + >> + // Return true if we should track loads. >> + virtual bool track_loads_p () >> + { >> + return true; >> + } >> + >> + // Return true if we should track stores. >> + virtual bool track_stores_p () >> + { >> + return true; >> + } >> + >> + // Return true if OFFSET is in range for a paired memory access. >> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; >> + >> + // Given a load/store pair insn in PATTERN, unpack the insn, storing >> + // the register operands in REGS, and returning the mem. LOAD_P is >> + // true for loads and false for stores. >> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0; >> + >> + // Given a pair mem in MEM, register operands in REGS, and an rtx >> + // representing the effect of writeback on the base register in WB_EFFECT, >> + // return an insn representing a writeback variant of this pair. >> + // LOAD_P is true iff the pair is a load. >> + // This is used when promoting existing non-writeback pairs to writeback >> + // variants. >> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, >> + rtx regs[2], bool load_p) = 0; >> + >> + inline void process_block (rtl_ssa::bb_info *bb); >> + inline rtl_ssa::insn_info *find_trailing_add (rtl_ssa::insn_info >> *insns[2], >> + const rtl_ssa::insn_range_info >> &pair_range, > > Nit: long line here. You could either add a line break after the return type > of the function or immediately before &pair_range. I'd guess the latter is > preferred, but I'm not too sure in this case. > Addressed in v4 of the patch. >> + int initial_writeback, >> + rtx *writeback_effect, >> + rtl_ssa::def_info **add_def, >> + rtl_ssa::def_info *base_def, >> + poly_int64 initial_offset, >> + unsigned access_size); >> + inline int get_viable_bases (rtl_ssa::insn_info *insns[2], >> + vec<base_cand> &base_cands, >> + rtx cand_mems[2], >> + unsigned access_size, >> + bool reversed); >> + inline void do_alias_analysis (rtl_ssa::insn_info *alias_hazards[4], >> + alias_walker *walkers[4], >> + bool load_p); >> + inline void try_promote_writeback (rtl_ssa::insn_info *insn, bool load_p); >> + void run (); >> + ~pair_fusion (); >> +}; >> -- >> 2.39.3 >> > > Thanks, > Alex Thanks & Regards Ajit