Alex Coplan <alex.cop...@arm.com> writes: > This patch overhauls the load/store pair patterns with two main goals: > > 1. Fixing a correctness issue (the current patterns are not RA-friendly). > 2. Allowing more flexibility in which operand modes are supported, and which > combinations of modes are allowed in the two arms of the load/store pair, > while reducing the number of patterns required both in the source and in > the generated code. > > The correctness issue (1) is due to the fact that the current patterns have > two independent memory operands tied together only by a predicate on the > insns. > Since LRA only looks at the constraints, one of the memory operands can get > reloaded without the other one being changed, leading to the insn becoming > unrecognizable after reload. > > We fix this issue by changing the patterns such that they only ever have one > memory operand representing the entire pair. For the store case, we use an > unspec to logically concatenate the register operands before storing them. > For the load case, we use unspecs to extract the "lanes" from the pair mem, > with the second occurrence of the mem matched using a match_dup (such that > there > is still really only one memory operand as far as the RA is concerned). > > In terms of the modes used for the pair memory operands, we canonicalize > these to V2x4QImode, V2x8QImode, and V2x16QImode. These modes have not > only the correct size but also correct alignment requirement for a > memory operand representing an entire load/store pair. Unlike the other > two, V2x4QImode didn't previously exist, so had to be added with the > patch. > > As with the previous patch generalizing the writeback patterns, this > patch aims to be flexible in the combinations of modes supported by the > patterns without requiring a large number of generated patterns by using > distinct mode iterators. > > The new scheme means we only need a single (generated) pattern for each > load/store operation of a given operand size. For the 4-byte and 8-byte > operand cases, we use the GPI iterator to synthesize the two patterns. > The 16-byte case is implemented as a separate pattern in the source (due > to only having a single possible alternative). > > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code, > we add REG_CFA_OFFSET notes to the store pair insns emitted by > aarch64_save_callee_saves, so that correct CFI information can still be > generated. Furthermore, we now unconditionally generate these CFA > notes on frame-related insns emitted by aarch64_save_callee_saves. > This is done in case that the load/store pair pass forms these into > pairs, in which case the CFA notes would be needed. > > We also adjust the ldp/stp peepholes to generate the new form. This is > done by switching the generation to use the > aarch64_gen_{load,store}_pair interface, making it easier to change the > form in the future if needed. (Likewise, the upcoming aarch64 > load/store pair pass also makes use of this interface). > > This patch also adds an "ldpstp" attribute to the non-writeback > load/store pair patterns, which is used by the post-RA load/store pair > pass to identify existing patterns and see if they can be promoted to > writeback variants. > > One potential concern with using unspecs for the patterns is that it can block > optimization by the generic RTL passes. This patch series tries to mitigate > this in two ways: > 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline. > 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to > emit individual loads/stores instead of ldp/stp. These should then be > formed back into load/store pairs much later in the RTL pipeline by the > new load/store pair pass. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > Thanks, > Alex > > gcc/ChangeLog: > > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp > representation from peepholes, allowing use of new form. > * config/aarch64/aarch64-modes.def (V2x4QImode): Define. > * config/aarch64/aarch64-protos.h > (aarch64_finish_ldpstp_peephole): Declare. > (aarch64_swap_ldrstr_operands): Delete declaration. > (aarch64_gen_load_pair): Declare. > (aarch64_gen_store_pair): Declare. > * config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>): > Delete. > (vec_store_pair<DREG:mode><DREG2:mode>): Delete. > (load_pair<VQ:mode><VQ2:mode>): Delete. > (vec_store_pair<VQ:mode><VQ2:mode>): Delete. > * config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New. > (aarch64_gen_store_pair): Adjust to use new unspec form of stp. > Drop second mem from parameters. > (aarch64_gen_load_pair): Likewise. > (aarch64_pair_mem_from_base): New. > (aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for > frame-related saves. Adjust call to aarch64_gen_store_pair > (aarch64_restore_callee_saves): Adjust calls to > aarch64_gen_load_pair to account for change in interface. > (aarch64_process_components): Likewise. > (aarch64_classify_address): Handle 32-byte pair mems in > LDP_STP_N case. > (aarch64_print_operand): Likewise. > (aarch64_copy_one_block_and_progress_pointers): Adjust calls to > account for change in aarch64_gen_{load,store}_pair interface. > (aarch64_set_one_block_and_progress_pointer): Likewise. > (aarch64_finish_ldpstp_peephole): New. > (aarch64_gen_adjusted_ldpstp): Adjust to use generation helper. > * config/aarch64/aarch64.md (ldpstp): New attribute. > (load_pair_sw_<SX:mode><SX2:mode>): Delete. > (load_pair_dw_<DX:mode><DX2:mode>): Delete. > (load_pair_dw_<TX:mode><TX2:mode>): Delete. > (*load_pair_<ldst_sz>): New. > (*load_pair_16): New. > (store_pair_sw_<SX:mode><SX2:mode>): Delete. > (store_pair_dw_<DX:mode><DX2:mode>): Delete. > (store_pair_dw_<TX:mode><TX2:mode>): Delete. > (*store_pair_<ldst_sz>): New. > (*store_pair_16): New. > (*load_pair_extendsidi2_aarch64): Adjust to use new form. > (*zero_extendsidi2_aarch64): Likewise. > * config/aarch64/iterators.md (VPAIR): New. > * config/aarch64/predicates.md (aarch64_mem_pair_operand): Change to > a special predicate derived from aarch64_mem_pair_operator. > --- > gcc/config/aarch64/aarch64-ldpstp.md | 66 +++---- > gcc/config/aarch64/aarch64-modes.def | 6 +- > gcc/config/aarch64/aarch64-protos.h | 5 +- > gcc/config/aarch64/aarch64-simd.md | 60 ------- > gcc/config/aarch64/aarch64.cc | 257 +++++++++++++++------------ > gcc/config/aarch64/aarch64.md | 188 +++++++++----------- > gcc/config/aarch64/iterators.md | 3 + > gcc/config/aarch64/predicates.md | 10 +- > 8 files changed, 270 insertions(+), 325 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-ldpstp.md > b/gcc/config/aarch64/aarch64-ldpstp.md > index 1ee7c73ff0c..dc39af85254 100644 > --- a/gcc/config/aarch64/aarch64-ldpstp.md > +++ b/gcc/config/aarch64/aarch64-ldpstp.md > @@ -24,10 +24,10 @@ (define_peephole2 > (set (match_operand:GPI 2 "register_operand" "") > (match_operand:GPI 3 "memory_operand" ""))] > "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true); > + DONE; > }) > > (define_peephole2 > @@ -36,10 +36,10 @@ (define_peephole2 > (set (match_operand:GPI 2 "memory_operand" "") > (match_operand:GPI 3 "aarch64_reg_or_zero" ""))] > "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, false); > + aarch64_finish_ldpstp_peephole (operands, false); > + DONE; > }) > > (define_peephole2 > @@ -48,10 +48,10 @@ (define_peephole2 > (set (match_operand:GPF 2 "register_operand" "") > (match_operand:GPF 3 "memory_operand" ""))] > "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true); > + DONE; > }) > > (define_peephole2 > @@ -60,10 +60,10 @@ (define_peephole2 > (set (match_operand:GPF 2 "memory_operand" "") > (match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))] > "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, false); > + aarch64_finish_ldpstp_peephole (operands, false); > + DONE; > }) > > (define_peephole2 > @@ -72,10 +72,10 @@ (define_peephole2 > (set (match_operand:DREG2 2 "register_operand" "") > (match_operand:DREG2 3 "memory_operand" ""))] > "aarch64_operands_ok_for_ldpstp (operands, true, <DREG:MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true); > + DONE; > }) > > (define_peephole2 > @@ -84,10 +84,10 @@ (define_peephole2 > (set (match_operand:DREG2 2 "memory_operand" "") > (match_operand:DREG2 3 "register_operand" ""))] > "aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, false); > + aarch64_finish_ldpstp_peephole (operands, false); > + DONE; > }) > > (define_peephole2 > @@ -99,10 +99,10 @@ (define_peephole2 > && aarch64_operands_ok_for_ldpstp (operands, true, <VQ:MODE>mode) > && (aarch64_tune_params.extra_tuning_flags > & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true); > + DONE; > }) > > (define_peephole2 > @@ -114,10 +114,10 @@ (define_peephole2 > && aarch64_operands_ok_for_ldpstp (operands, false, <VQ:MODE>mode) > && (aarch64_tune_params.extra_tuning_flags > & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, false); > + aarch64_finish_ldpstp_peephole (operands, false); > + DONE; > }) > > > @@ -129,10 +129,10 @@ (define_peephole2 > (set (match_operand:DI 2 "register_operand" "") > (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))] > "aarch64_operands_ok_for_ldpstp (operands, true, SImode)" > - [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1))) > - (set (match_dup 2) (sign_extend:DI (match_dup 3)))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true, SIGN_EXTEND); > + DONE; > }) > > (define_peephole2 > @@ -141,10 +141,10 @@ (define_peephole2 > (set (match_operand:DI 2 "register_operand" "") > (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))] > "aarch64_operands_ok_for_ldpstp (operands, true, SImode)" > - [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1))) > - (set (match_dup 2) (zero_extend:DI (match_dup 3)))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, true); > + aarch64_finish_ldpstp_peephole (operands, true, ZERO_EXTEND); > + DONE; > }) > > ;; Handle storing of a floating point zero with integer data. > @@ -163,10 +163,10 @@ (define_peephole2 > (set (match_operand:<FCVT_TARGET> 2 "memory_operand" "") > (match_operand:<FCVT_TARGET> 3 "aarch64_reg_zero_or_fp_zero" ""))] > "aarch64_operands_ok_for_ldpstp (operands, false, <V_INT_EQUIV>mode)" > - [(parallel [(set (match_dup 0) (match_dup 1)) > - (set (match_dup 2) (match_dup 3))])] > + [(const_int 0)] > { > - aarch64_swap_ldrstr_operands (operands, false); > + aarch64_finish_ldpstp_peephole (operands, false); > + DONE; > }) > > ;; Handle consecutive load/store whose offset is out of the range > diff --git a/gcc/config/aarch64/aarch64-modes.def > b/gcc/config/aarch64/aarch64-modes.def > index 6b4f4e17dd5..1e0d770f72f 100644 > --- a/gcc/config/aarch64/aarch64-modes.def > +++ b/gcc/config/aarch64/aarch64-modes.def > @@ -93,9 +93,13 @@ INT_MODE (XI, 64); > > /* V8DI mode. */ > VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5); > - > ADJUST_ALIGNMENT (V8DI, 8); > > +/* V2x4QImode. Used in load/store pair patterns. */ > +VECTOR_MODE_WITH_PREFIX (V2x, INT, QI, 4, 5); > +ADJUST_NUNITS (V2x4QI, 8); > +ADJUST_ALIGNMENT (V2x4QI, 4); > + > /* Define Advanced SIMD modes for structures of 2, 3 and 4 d-registers. */ > #define ADV_SIMD_D_REG_STRUCT_MODES(NVECS, VB, VH, VS, VD) \ > VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 8, 3); \ > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index e463fd5c817..2ab54f244a7 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -967,6 +967,8 @@ void aarch64_split_compare_and_swap (rtx op[]); > void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); > > bool aarch64_gen_adjusted_ldpstp (rtx *, bool, machine_mode, RTX_CODE); > +void aarch64_finish_ldpstp_peephole (rtx *, bool, > + enum rtx_code = (enum rtx_code)0); > > void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx); > bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool); > @@ -1022,8 +1024,9 @@ bool aarch64_mergeable_load_pair_p (machine_mode, rtx, > rtx); > bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode); > bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode); > bool aarch64_mem_ok_with_ldpstp_policy_model (rtx, bool, machine_mode); > -void aarch64_swap_ldrstr_operands (rtx *, bool); > bool aarch64_ldpstp_operand_mode_p (machine_mode); > +rtx aarch64_gen_load_pair (rtx, rtx, rtx, enum rtx_code = (enum rtx_code)0); > +rtx aarch64_gen_store_pair (rtx, rtx, rtx); > > extern void aarch64_asm_output_pool_epilogue (FILE *, const char *, > tree, HOST_WIDE_INT); > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index c6f2d582837..6f5080ab030 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -231,38 +231,6 @@ (define_insn "aarch64_store_lane0<mode>" > [(set_attr "type" "neon_store1_1reg<q>")] > ) > > -(define_insn "load_pair<DREG:mode><DREG2:mode>" > - [(set (match_operand:DREG 0 "register_operand") > - (match_operand:DREG 1 "aarch64_mem_pair_operand")) > - (set (match_operand:DREG2 2 "register_operand") > - (match_operand:DREG2 3 "memory_operand"))] > - "TARGET_FLOAT > - && rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (<DREG:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type ] > - [ w , Ump , w , m ; neon_ldp ] ldp\t%d0, %d2, %z1 > - [ r , Ump , r , m ; load_16 ] ldp\t%x0, %x2, %z1 > - } > -) > - > -(define_insn "vec_store_pair<DREG:mode><DREG2:mode>" > - [(set (match_operand:DREG 0 "aarch64_mem_pair_operand") > - (match_operand:DREG 1 "register_operand")) > - (set (match_operand:DREG2 2 "memory_operand") > - (match_operand:DREG2 3 "register_operand"))] > - "TARGET_FLOAT > - && rtx_equal_p (XEXP (operands[2], 0), > - plus_constant (Pmode, > - XEXP (operands[0], 0), > - GET_MODE_SIZE (<DREG:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type ] > - [ Ump , w , m , w ; neon_stp ] stp\t%d1, %d3, %z0 > - [ Ump , r , m , r ; store_16 ] stp\t%x1, %x3, %z0 > - } > -) > - > (define_insn "aarch64_simd_stp<mode>" > [(set (match_operand:VP_2E 0 "aarch64_mem_pair_lanes_operand") > (vec_duplicate:VP_2E (match_operand:<VEL> 1 "register_operand")))] > @@ -273,34 +241,6 @@ (define_insn "aarch64_simd_stp<mode>" > } > ) > > -(define_insn "load_pair<VQ:mode><VQ2:mode>" > - [(set (match_operand:VQ 0 "register_operand" "=w") > - (match_operand:VQ 1 "aarch64_mem_pair_operand" "Ump")) > - (set (match_operand:VQ2 2 "register_operand" "=w") > - (match_operand:VQ2 3 "memory_operand" "m"))] > - "TARGET_FLOAT > - && rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (<VQ:MODE>mode)))" > - "ldp\\t%q0, %q2, %z1" > - [(set_attr "type" "neon_ldp_q")] > -) > - > -(define_insn "vec_store_pair<VQ:mode><VQ2:mode>" > - [(set (match_operand:VQ 0 "aarch64_mem_pair_operand" "=Ump") > - (match_operand:VQ 1 "register_operand" "w")) > - (set (match_operand:VQ2 2 "memory_operand" "=m") > - (match_operand:VQ2 3 "register_operand" "w"))] > - "TARGET_FLOAT > - && rtx_equal_p (XEXP (operands[2], 0), > - plus_constant (Pmode, > - XEXP (operands[0], 0), > - GET_MODE_SIZE (<VQ:MODE>mode)))" > - "stp\\t%q1, %q3, %z0" > - [(set_attr "type" "neon_stp_q")] > -) > - > (define_expand "@aarch64_split_simd_mov<mode>" > [(set (match_operand:VQMOV 0) > (match_operand:VQMOV 1))] > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index ccf081d2a16..1f6094bf1bc 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -9056,59 +9056,81 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, > HOST_WIDE_INT adjustment, > } > } > > -/* Generate and return a store pair instruction of mode MODE to store > - register REG1 to MEM1 and register REG2 to MEM2. */ > +static machine_mode > +aarch64_pair_mode_for_mode (machine_mode mode) > +{ > + if (known_eq (GET_MODE_SIZE (mode), 4)) > + return E_V2x4QImode; > + else if (known_eq (GET_MODE_SIZE (mode), 8)) > + return E_V2x8QImode; > + else if (known_eq (GET_MODE_SIZE (mode), 16)) > + return E_V2x16QImode; > + else > + gcc_unreachable (); > +}
Missing function comment. There should be no need to use E_ outside switches. > > static rtx > -aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, > - rtx reg2) > +aarch64_pair_mem_from_base (rtx mem) > { > - switch (mode) > - { > - case E_DImode: > - return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2); > - > - case E_DFmode: > - return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2); > - > - case E_TFmode: > - return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); > + auto pair_mode = aarch64_pair_mode_for_mode (GET_MODE (mem)); > + mem = adjust_bitfield_address_nv (mem, pair_mode, 0); > + gcc_assert (aarch64_mem_pair_lanes_operand (mem, pair_mode)); > + return mem; > +} > > - case E_V4SImode: > - return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); > +/* Generate and return a store pair instruction to store REG1 and REG2 > + into memory starting at BASE_MEM. All three rtxes should have modes of > the > + same size. */ > > - case E_V16QImode: > - return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); > +rtx > +aarch64_gen_store_pair (rtx base_mem, rtx reg1, rtx reg2) > +{ > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > - default: > - gcc_unreachable (); > - } > + return gen_rtx_SET (pair_mem, > + gen_rtx_UNSPEC (GET_MODE (pair_mem), > + gen_rtvec (2, reg1, reg2), > + UNSPEC_STP)); > } > > -/* Generate and regurn a load pair isntruction of mode MODE to load register > - REG1 from MEM1 and register REG2 from MEM2. */ > +/* Generate and return a load pair instruction to load a pair of > + registers starting at BASE_MEM into REG1 and REG2. If CODE is > + UNKNOWN, all three rtxes should have modes of the same size. > + Otherwise, CODE is {SIGN,ZERO}_EXTEND, base_mem should be in SImode, > + and REG{1,2} should be in DImode. */ > > -static rtx > -aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, > - rtx mem2) > +rtx > +aarch64_gen_load_pair (rtx reg1, rtx reg2, rtx base_mem, enum rtx_code code) > { > - switch (mode) > - { > - case E_DImode: > - return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2); > + rtx pair_mem = aarch64_pair_mem_from_base (base_mem); > > - case E_DFmode: > - return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2); > - > - case E_TFmode: > - return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); > + const bool any_extend_p = (code == ZERO_EXTEND || code == SIGN_EXTEND); > + if (any_extend_p) > + { > + gcc_checking_assert (GET_MODE (base_mem) == SImode); > + gcc_checking_assert (GET_MODE (reg1) == DImode); > + gcc_checking_assert (GET_MODE (reg2) == DImode); Not a personal preference, but I think single asserts with && are preferred. > + } > + else > + gcc_assert (code == UNKNOWN); > + > + rtx unspecs[2] = { > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg1), > + gen_rtvec (1, pair_mem), > + UNSPEC_LDP_FST), > + gen_rtx_UNSPEC (any_extend_p ? SImode : GET_MODE (reg2), IIUC, the unspec modes could both be GET_MODE (base_mem) > + gen_rtvec (1, copy_rtx (pair_mem)), > + UNSPEC_LDP_SND) > + }; > > - case E_V4SImode: > - return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); > + if (any_extend_p) > + for (int i = 0; i < 2; i++) > + unspecs[i] = gen_rtx_fmt_e (code, DImode, unspecs[i]); > > - default: > - gcc_unreachable (); > - } > + return gen_rtx_PARALLEL (VOIDmode, > + gen_rtvec (2, > + gen_rtx_SET (reg1, unspecs[0]), > + gen_rtx_SET (reg2, unspecs[1]))); > } > > /* Return TRUE if return address signing should be enabled for the current > @@ -9321,8 +9343,19 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, > offset -= fp_offset; > } > rtx mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, > offset)); > - bool need_cfa_note_p = (base_rtx != stack_pointer_rtx); > > + rtx cfa_base = stack_pointer_rtx; > + poly_int64 cfa_offset = sp_offset; I don't think we need both cfa_offset and sp_offset. sp_offset in the current code only exists for CFI purposes. > + > + if (hard_fp_valid_p && frame_pointer_needed) > + { > + cfa_base = hard_frame_pointer_rtx; > + cfa_offset += (bytes_below_sp - frame.bytes_below_hard_fp); > + } > + > + rtx cfa_mem = gen_frame_mem (mode, > + plus_constant (Pmode, > + cfa_base, cfa_offset)); > unsigned int regno2; > if (!aarch64_sve_mode_p (mode) > && i + 1 < regs.size () > @@ -9331,45 +9364,37 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, > frame.reg_offset[regno2] - frame.reg_offset[regno])) > { > rtx reg2 = gen_rtx_REG (mode, regno2); > - rtx mem2; > > offset += GET_MODE_SIZE (mode); > - mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); > - insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, > - reg2)); > - > - /* The first part of a frame-related parallel insn is > - always assumed to be relevant to the frame > - calculations; subsequent parts, are only > - frame-related if explicitly marked. */ > + insn = emit_insn (aarch64_gen_store_pair (mem, reg, reg2)); > + > if (aarch64_emit_cfi_for_reg_p (regno2)) > { > - if (need_cfa_note_p) > - aarch64_add_cfa_expression (insn, reg2, stack_pointer_rtx, > - sp_offset + GET_MODE_SIZE (mode)); > - else > - RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1; > + rtx cfa_mem2 = adjust_address_nv (cfa_mem, > + Pmode, > + GET_MODE_SIZE (mode)); Think this should use get_frame_mem directly, rather than moving beyond the bounds of the original mem. > + add_reg_note (insn, REG_CFA_OFFSET, > + gen_rtx_SET (cfa_mem2, reg2)); > } > > regno = regno2; > ++i; > } > else if (mode == VNx2DImode && BYTES_BIG_ENDIAN) > - { > - insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg)); > - need_cfa_note_p = true; > - } > + insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg)); > else if (aarch64_sve_mode_p (mode)) > insn = emit_insn (gen_rtx_SET (mem, reg)); > else > insn = emit_move_insn (mem, reg); > > RTX_FRAME_RELATED_P (insn) = frame_related_p; > - if (frame_related_p && need_cfa_note_p) > - aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset); > + > + if (frame_related_p) > + add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (cfa_mem, reg)); For the record, I might need to add back some CFA_EXPRESSIONs for locally-streaming SME functions, to ensure that the CFI code doesn't aggregate SVE saves across a change in the VG DWARF register. But it's probably easier to do that once the patch is in, since having a note on all insns will help to ensure consistency. > } > } > > + Stray extra whitespace. > /* Emit code to restore the callee registers in REGS, ignoring pop candidates > and any other registers that are handled separately. Write the > appropriate > REG_CFA_RESTORE notes into CFI_OPS. > @@ -9425,12 +9450,7 @@ aarch64_restore_callee_saves (poly_int64 > bytes_below_sp, > frame.reg_offset[regno2] - frame.reg_offset[regno])) > { > rtx reg2 = gen_rtx_REG (mode, regno2); > - rtx mem2; > - > - offset += GET_MODE_SIZE (mode); > - mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); > - emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2)); > - > + emit_insn (aarch64_gen_load_pair (reg, reg2, mem)); > *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops); > regno = regno2; > ++i; > @@ -9762,9 +9782,9 @@ aarch64_process_components (sbitmap components, bool > prologue_p) > : gen_rtx_SET (reg2, mem2); > > if (prologue_p) > - insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, reg2)); > + insn = emit_insn (aarch64_gen_store_pair (mem, reg, reg2)); > else > - insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2)); > + insn = emit_insn (aarch64_gen_load_pair (reg, reg2, mem)); > > if (frame_related_p || frame_related2_p) > { > @@ -10983,12 +11003,18 @@ aarch64_classify_address (struct > aarch64_address_info *info, > mode of the corresponding addressing mode is half of that. */ > if (type == ADDR_QUERY_LDP_STP_N) > { > - if (known_eq (GET_MODE_SIZE (mode), 16)) > + if (known_eq (GET_MODE_SIZE (mode), 32)) > + mode = V16QImode; > + else if (known_eq (GET_MODE_SIZE (mode), 16)) > mode = DFmode; > else if (known_eq (GET_MODE_SIZE (mode), 8)) > mode = SFmode; > else > return false; > + > + /* This isn't really an Advanced SIMD struct mode, but a mode > + used to represent the complete mem in a load/store pair. */ > + advsimd_struct_p = false; > } > > bool allow_reg_index_p = (!load_store_pair_p > @@ -12609,7 +12635,8 @@ aarch64_print_operand (FILE *f, rtx x, int code) > if (!MEM_P (x) > || (code == 'y' > && maybe_ne (GET_MODE_SIZE (mode), 8) > - && maybe_ne (GET_MODE_SIZE (mode), 16))) > + && maybe_ne (GET_MODE_SIZE (mode), 16) > + && maybe_ne (GET_MODE_SIZE (mode), 32))) > { > output_operand_lossage ("invalid operand for '%%%c'", code); > return; > @@ -25431,10 +25458,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx > *src, rtx *dst, > *src = adjust_address (*src, mode, 0); > *dst = adjust_address (*dst, mode, 0); > /* Emit the memcpy. */ > - emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2, > - aarch64_progress_pointer (*src))); > - emit_insn (aarch64_gen_store_pair (mode, *dst, reg1, > - aarch64_progress_pointer (*dst), > reg2)); > + emit_insn (aarch64_gen_load_pair (reg1, reg2, *src)); > + emit_insn (aarch64_gen_store_pair (*dst, reg1, reg2)); > /* Move the pointers forward. */ > *src = aarch64_move_pointer (*src, 32); > *dst = aarch64_move_pointer (*dst, 32); > @@ -25613,8 +25638,7 @@ aarch64_set_one_block_and_progress_pointer (rtx src, > rtx *dst, > /* "Cast" the *dst to the correct mode. */ > *dst = adjust_address (*dst, mode, 0); > /* Emit the memset. */ > - emit_insn (aarch64_gen_store_pair (mode, *dst, src, > - aarch64_progress_pointer (*dst), src)); > + emit_insn (aarch64_gen_store_pair (*dst, src, src)); > > /* Move the pointers forward. */ > *dst = aarch64_move_pointer (*dst, 32); > @@ -26812,6 +26836,22 @@ aarch64_swap_ldrstr_operands (rtx* operands, bool > load) > } > } > > +void > +aarch64_finish_ldpstp_peephole (rtx *operands, bool load_p, enum rtx_code > code) Missing function comment. > +{ > + aarch64_swap_ldrstr_operands (operands, load_p); > + > + if (load_p) > + emit_insn (aarch64_gen_load_pair (operands[0], operands[2], > + operands[1], code)); > + else > + { > + gcc_assert (code == UNKNOWN); > + emit_insn (aarch64_gen_store_pair (operands[0], operands[1], > + operands[3])); > + } > +} > + > /* Taking X and Y to be HOST_WIDE_INT pointers, return the result of a > comparison between the two. */ > int > @@ -26993,8 +27033,8 @@ bool > aarch64_gen_adjusted_ldpstp (rtx *operands, bool load, > machine_mode mode, RTX_CODE code) > { > - rtx base, offset_1, offset_3, t1, t2; > - rtx mem_1, mem_2, mem_3, mem_4; > + rtx base, offset_1, offset_3; > + rtx mem_1, mem_2; > rtx temp_operands[8]; > HOST_WIDE_INT off_val_1, off_val_3, base_off, new_off_1, new_off_3, > stp_off_upper_limit, stp_off_lower_limit, msize; > @@ -27019,21 +27059,17 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool > load, > if (load) > { > mem_1 = copy_rtx (temp_operands[1]); > - mem_2 = copy_rtx (temp_operands[3]); > - mem_3 = copy_rtx (temp_operands[5]); > - mem_4 = copy_rtx (temp_operands[7]); > + mem_2 = copy_rtx (temp_operands[5]); > } > else > { > mem_1 = copy_rtx (temp_operands[0]); > - mem_2 = copy_rtx (temp_operands[2]); > - mem_3 = copy_rtx (temp_operands[4]); > - mem_4 = copy_rtx (temp_operands[6]); > + mem_2 = copy_rtx (temp_operands[4]); > gcc_assert (code == UNKNOWN); > } > > extract_base_offset_in_addr (mem_1, &base, &offset_1); > - extract_base_offset_in_addr (mem_3, &base, &offset_3); > + extract_base_offset_in_addr (mem_2, &base, &offset_3); mem_2 with offset_3 feels a bit awkward. Might be worth using mem_3 instead, so that the memory and register numbers are in sync. I suppose we still need Ump for the extending loads, is that right? Are there any other uses left? Thanks, Richard > gcc_assert (base != NULL_RTX && offset_1 != NULL_RTX > && offset_3 != NULL_RTX); > > @@ -27097,63 +27133,48 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool > load, > replace_equiv_address_nv (mem_1, plus_constant (Pmode, operands[8], > new_off_1), true); > replace_equiv_address_nv (mem_2, plus_constant (Pmode, operands[8], > - new_off_1 + msize), true); > - replace_equiv_address_nv (mem_3, plus_constant (Pmode, operands[8], > new_off_3), true); > - replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8], > - new_off_3 + msize), true); > > if (!aarch64_mem_pair_operand (mem_1, mode) > - || !aarch64_mem_pair_operand (mem_3, mode)) > + || !aarch64_mem_pair_operand (mem_2, mode)) > return false; > > - if (code == ZERO_EXTEND) > - { > - mem_1 = gen_rtx_ZERO_EXTEND (DImode, mem_1); > - mem_2 = gen_rtx_ZERO_EXTEND (DImode, mem_2); > - mem_3 = gen_rtx_ZERO_EXTEND (DImode, mem_3); > - mem_4 = gen_rtx_ZERO_EXTEND (DImode, mem_4); > - } > - else if (code == SIGN_EXTEND) > - { > - mem_1 = gen_rtx_SIGN_EXTEND (DImode, mem_1); > - mem_2 = gen_rtx_SIGN_EXTEND (DImode, mem_2); > - mem_3 = gen_rtx_SIGN_EXTEND (DImode, mem_3); > - mem_4 = gen_rtx_SIGN_EXTEND (DImode, mem_4); > - } > - > if (load) > { > operands[0] = temp_operands[0]; > operands[1] = mem_1; > operands[2] = temp_operands[2]; > - operands[3] = mem_2; > operands[4] = temp_operands[4]; > - operands[5] = mem_3; > + operands[5] = mem_2; > operands[6] = temp_operands[6]; > - operands[7] = mem_4; > } > else > { > operands[0] = mem_1; > operands[1] = temp_operands[1]; > - operands[2] = mem_2; > operands[3] = temp_operands[3]; > - operands[4] = mem_3; > + operands[4] = mem_2; > operands[5] = temp_operands[5]; > - operands[6] = mem_4; > operands[7] = temp_operands[7]; > } > > /* Emit adjusting instruction. */ > emit_insn (gen_rtx_SET (operands[8], plus_constant (DImode, base, > base_off))); > /* Emit ldp/stp instructions. */ > - t1 = gen_rtx_SET (operands[0], operands[1]); > - t2 = gen_rtx_SET (operands[2], operands[3]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > - t1 = gen_rtx_SET (operands[4], operands[5]); > - t2 = gen_rtx_SET (operands[6], operands[7]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > + if (load) > + { > + emit_insn (aarch64_gen_load_pair (operands[0], operands[2], > + operands[1], code)); > + emit_insn (aarch64_gen_load_pair (operands[4], operands[6], > + operands[5], code)); > + } > + else > + { > + emit_insn (aarch64_gen_store_pair (operands[0], operands[1], > + operands[3])); > + emit_insn (aarch64_gen_store_pair (operands[4], operands[5], > + operands[7])); > + } > return true; > } > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c92a51690c5..ffb6b0ba749 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -175,6 +175,9 @@ (define_c_enum "unspec" [ > UNSPEC_GOTSMALLTLS > UNSPEC_GOTTINYPIC > UNSPEC_GOTTINYTLS > + UNSPEC_STP > + UNSPEC_LDP_FST > + UNSPEC_LDP_SND > UNSPEC_LD1 > UNSPEC_LD2 > UNSPEC_LD2_DREG > @@ -453,6 +456,11 @@ (define_attr "predicated" "yes,no" (const_string "no")) > ;; may chose to hold the tracking state encoded in SP. > (define_attr "speculation_barrier" "true,false" (const_string "false")) > > +;; Attribute use to identify load pair and store pair instructions. > +;; Currently the attribute is only applied to the non-writeback ldp/stp > +;; patterns. > +(define_attr "ldpstp" "ldp,stp,none" (const_string "none")) > + > ;; ------------------------------------------------------------------- > ;; Pipeline descriptions and scheduling > ;; ------------------------------------------------------------------- > @@ -1735,100 +1743,62 @@ (define_expand "setmemdi" > FAIL; > }) > > -;; Operands 1 and 3 are tied together by the final condition; so we allow > -;; fairly lax checking on the second memory operation. > -(define_insn "load_pair_sw_<SX:mode><SX2:mode>" > - [(set (match_operand:SX 0 "register_operand") > - (match_operand:SX 1 "aarch64_mem_pair_operand")) > - (set (match_operand:SX2 2 "register_operand") > - (match_operand:SX2 3 "memory_operand"))] > - "rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (<SX:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] > - [ r , Ump , r , m ; load_8 , * ] ldp\t%w0, %w2, %z1 > - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%s0, %s2, %z1 > - } > -) > - > -;; Storing different modes that can still be merged > -(define_insn "load_pair_dw_<DX:mode><DX2:mode>" > - [(set (match_operand:DX 0 "register_operand") > - (match_operand:DX 1 "aarch64_mem_pair_operand")) > - (set (match_operand:DX2 2 "register_operand") > - (match_operand:DX2 3 "memory_operand"))] > - "rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (<DX:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] > - [ r , Ump , r , m ; load_16 , * ] ldp\t%x0, %x2, %z1 > - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%d0, %d2, %z1 > - } > -) > - > -(define_insn "load_pair_dw_<TX:mode><TX2:mode>" > - [(set (match_operand:TX 0 "register_operand" "=w") > - (match_operand:TX 1 "aarch64_mem_pair_operand" "Ump")) > - (set (match_operand:TX2 2 "register_operand" "=w") > - (match_operand:TX2 3 "memory_operand" "m"))] > - "TARGET_SIMD > - && rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (<TX:MODE>mode)))" > - "ldp\\t%q0, %q2, %z1" > +(define_insn "*load_pair_<ldst_sz>" > + [(set (match_operand:GPI 0 "aarch64_ldp_reg_operand") > + (unspec [ > + (match_operand:<VPAIR> 1 "aarch64_mem_pair_lanes_operand") > + ] UNSPEC_LDP_FST)) > + (set (match_operand:GPI 2 "aarch64_ldp_reg_operand") > + (unspec [ > + (match_dup 1) > + ] UNSPEC_LDP_SND))] > + "" > + {@ [cons: =0, 1, =2; attrs: type, arch] > + [ r, Umn, r; load_<ldpstp_sz>, * ] ldp\t%<w>0, %<w>2, %y1 > + [ w, Umn, w; neon_load1_2reg, fp ] ldp\t%<v>0, %<v>2, %y1 > + } > + [(set_attr "ldpstp" "ldp")] > +) > + > +(define_insn "*load_pair_16" > + [(set (match_operand:TI 0 "aarch64_ldp_reg_operand" "=w") > + (unspec [ > + (match_operand:V2x16QI 1 "aarch64_mem_pair_lanes_operand" "Umn") > + ] UNSPEC_LDP_FST)) > + (set (match_operand:TI 2 "aarch64_ldp_reg_operand" "=w") > + (unspec [ > + (match_dup 1) > + ] UNSPEC_LDP_SND))] > + "TARGET_FLOAT" > + "ldp\\t%q0, %q2, %y1" > [(set_attr "type" "neon_ldp_q") > - (set_attr "fp" "yes")] > -) > - > -;; Operands 0 and 2 are tied together by the final condition; so we allow > -;; fairly lax checking on the second memory operation. > -(define_insn "store_pair_sw_<SX:mode><SX2:mode>" > - [(set (match_operand:SX 0 "aarch64_mem_pair_operand") > - (match_operand:SX 1 "aarch64_reg_zero_or_fp_zero")) > - (set (match_operand:SX2 2 "memory_operand") > - (match_operand:SX2 3 "aarch64_reg_zero_or_fp_zero"))] > - "rtx_equal_p (XEXP (operands[2], 0), > - plus_constant (Pmode, > - XEXP (operands[0], 0), > - GET_MODE_SIZE (<SX:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] > - [ Ump , rYZ , m , rYZ ; store_8 , * ] stp\t%w1, %w3, > %z0 > - [ Ump , w , m , w ; neon_store1_2reg , fp ] stp\t%s1, %s3, > %z0 > - } > -) > - > -;; Storing different modes that can still be merged > -(define_insn "store_pair_dw_<DX:mode><DX2:mode>" > - [(set (match_operand:DX 0 "aarch64_mem_pair_operand") > - (match_operand:DX 1 "aarch64_reg_zero_or_fp_zero")) > - (set (match_operand:DX2 2 "memory_operand") > - (match_operand:DX2 3 "aarch64_reg_zero_or_fp_zero"))] > - "rtx_equal_p (XEXP (operands[2], 0), > - plus_constant (Pmode, > - XEXP (operands[0], 0), > - GET_MODE_SIZE (<DX:MODE>mode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] > - [ Ump , rYZ , m , rYZ ; store_16 , * ] stp\t%x1, %x3, > %z0 > - [ Ump , w , m , w ; neon_store1_2reg , fp ] stp\t%d1, %d3, > %z0 > - } > -) > - > -(define_insn "store_pair_dw_<TX:mode><TX2:mode>" > - [(set (match_operand:TX 0 "aarch64_mem_pair_operand" "=Ump") > - (match_operand:TX 1 "register_operand" "w")) > - (set (match_operand:TX2 2 "memory_operand" "=m") > - (match_operand:TX2 3 "register_operand" "w"))] > - "TARGET_SIMD && > - rtx_equal_p (XEXP (operands[2], 0), > - plus_constant (Pmode, > - XEXP (operands[0], 0), > - GET_MODE_SIZE (TFmode)))" > - "stp\\t%q1, %q3, %z0" > + (set_attr "fp" "yes") > + (set_attr "ldpstp" "ldp")] > +) > + > +(define_insn "*store_pair_<ldst_sz>" > + [(set (match_operand:<VPAIR> 0 "aarch64_mem_pair_lanes_operand") > + (unspec:<VPAIR> > + [(match_operand:GPI 1 "aarch64_stp_reg_operand") > + (match_operand:GPI 2 "aarch64_stp_reg_operand")] UNSPEC_STP))] > + "" > + {@ [cons: =0, 1, 2; attrs: type , arch] > + [ Umn, rYZ, rYZ; store_<ldpstp_sz>, * ] stp\t%<w>1, %<w>2, > %y0 > + [ Umn, w, w; neon_store1_2reg , fp ] stp\t%<v>1, %<v>2, > %y0 > + } > + [(set_attr "ldpstp" "stp")] > +) > + > +(define_insn "*store_pair_16" > + [(set (match_operand:V2x16QI 0 "aarch64_mem_pair_lanes_operand" "=Umn") > + (unspec:V2x16QI > + [(match_operand:TI 1 "aarch64_ldp_reg_operand" "w") > + (match_operand:TI 2 "aarch64_ldp_reg_operand" "w")] UNSPEC_STP))] > + "TARGET_FLOAT" > + "stp\t%q1, %q2, %y0" > [(set_attr "type" "neon_stp_q") > - (set_attr "fp" "yes")] > + (set_attr "fp" "yes") > + (set_attr "ldpstp" "stp")] > ) > > ;; Writeback load/store pair patterns. > @@ -2074,14 +2044,15 @@ (define_insn "*extendsidi2_aarch64" > > (define_insn "*load_pair_extendsidi2_aarch64" > [(set (match_operand:DI 0 "register_operand" "=r") > - (sign_extend:DI (match_operand:SI 1 "aarch64_mem_pair_operand" "Ump"))) > + (sign_extend:DI (unspec:SI [ > + (match_operand:V2x4QI 1 "aarch64_mem_pair_lanes_operand" "Umn") > + ] UNSPEC_LDP_FST))) > (set (match_operand:DI 2 "register_operand" "=r") > - (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))] > - "rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (SImode)))" > - "ldpsw\\t%0, %2, %z1" > + (sign_extend:DI (unspec:SI [ > + (match_dup 1) > + ] UNSPEC_LDP_SND)))] > + "" > + "ldpsw\\t%0, %2, %y1" > [(set_attr "type" "load_8")] > ) > > @@ -2101,16 +2072,17 @@ (define_insn "*zero_extendsidi2_aarch64" > > (define_insn "*load_pair_zero_extendsidi2_aarch64" > [(set (match_operand:DI 0 "register_operand") > - (zero_extend:DI (match_operand:SI 1 "aarch64_mem_pair_operand"))) > + (zero_extend:DI (unspec:SI [ > + (match_operand:V2x4QI 1 "aarch64_mem_pair_lanes_operand") > + ] UNSPEC_LDP_FST))) > (set (match_operand:DI 2 "register_operand") > - (zero_extend:DI (match_operand:SI 3 "memory_operand")))] > - "rtx_equal_p (XEXP (operands[3], 0), > - plus_constant (Pmode, > - XEXP (operands[1], 0), > - GET_MODE_SIZE (SImode)))" > - {@ [ cons: =0 , 1 , =2 , 3 ; attrs: type , arch ] > - [ r , Ump , r , m ; load_8 , * ] ldp\t%w0, %w2, %z1 > - [ w , Ump , w , m ; neon_load1_2reg , fp ] ldp\t%s0, %s2, %z1 > + (zero_extend:DI (unspec:SI [ > + (match_dup 1) > + ] UNSPEC_LDP_SND)))] > + "" > + {@ [ cons: =0 , 1 , =2; attrs: type , arch] > + [ r , Umn , r ; load_8 , * ] ldp\t%w0, %w2, %y1 > + [ w , Umn , w ; neon_load1_2reg, fp ] ldp\t%s0, %s2, %y1 > } > ) > > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index a920de99ffc..fd8dd6db349 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -1435,6 +1435,9 @@ (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI") > (SI "V2SI") (SF "V2SF") > (DI "V2DI") (DF "V2DF")]) > > +;; Load/store pair mode. > +(define_mode_attr VPAIR [(SI "V2x4QI") (DI "V2x8QI")]) > + > ;; Register suffix for double-length mode. > (define_mode_attr Vdtype [(V4HF "8h") (V2SF "4s")]) > > diff --git a/gcc/config/aarch64/predicates.md > b/gcc/config/aarch64/predicates.md > index b647e5af7c6..80f2e03d8de 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -266,10 +266,12 @@ (define_special_predicate "aarch64_mem_pair_operator" > (match_test "known_eq (GET_MODE_SIZE (mode), > GET_MODE_SIZE (GET_MODE (op)))")))) > > -(define_predicate "aarch64_mem_pair_operand" > - (and (match_code "mem") > - (match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), false, > - ADDR_QUERY_LDP_STP)"))) > +;; Like aarch64_mem_pair_operator, but additionally check the > +;; address is suitable. > +(define_special_predicate "aarch64_mem_pair_operand" > + (and (match_operand 0 "aarch64_mem_pair_operator") > + (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, > 0), > + false, ADDR_QUERY_LDP_STP)"))) > > (define_predicate "pmode_plus_operator" > (and (match_code "plus")