No problem! I'll send a follow up with the requested changes.
Thanks for the input! Manos. On Thu, Sep 28, 2023 at 4:42 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes: > > Hey Richard, > > > > Thanks for taking the time to review this, but it has been commited since > > yesterday after getting reviewed by Kyrill and Tamar. > > > > Discussions: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html > > > > Commited version: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html > > Sorry about that. I had v3 being filtered differently and so it went > into a different inbox. > > Richard > > > > > Manos. > > > > On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford < > richard.sandif...@arm.com> > > wrote: > > > >> Thanks for the patch and sorry for the slow review. > >> > >> Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes: > >> > This patch implements the following TODO in > gcc/config/aarch64/aarch64.cc > >> > to provide the requested behaviour for handling ldp and stp: > >> > > >> > /* Allow the tuning structure to disable LDP instruction formation > >> > from combining instructions (e.g., in peephole2). > >> > TODO: Implement fine-grained tuning control for LDP and STP: > >> > 1. control policies for load and store separately; > >> > 2. support the following policies: > >> > - default (use what is in the tuning structure) > >> > - always > >> > - never > >> > - aligned (only if the compiler can prove that the > >> > load will be aligned to 2 * element_size) */ > >> > > >> > It provides two new and concrete command-line options -mldp-policy and > >> -mstp-policy > >> > to give the ability to control load and store policies seperately as > >> > stated in part 1 of the TODO. > >> > > >> > The accepted values for both options are: > >> > - default: Use the ldp/stp policy defined in the corresponding tuning > >> > structure. > >> > - always: Emit ldp/stp regardless of alignment. > >> > - never: Do not emit ldp/stp. > >> > - aligned: In order to emit ldp/stp, first check if the load/store > will > >> > be aligned to 2 * element_size. > >> > > >> > gcc/ChangeLog: > >> > * config/aarch64/aarch64-protos.h (struct tune_params): Add > >> > appropriate enums for the policies. > >> > * config/aarch64/aarch64-tuning-flags.def > >> > (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning > >> > options. > >> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New > >> > function to parse ldp-policy option. > >> > (aarch64_parse_stp_policy): New function to parse stp-policy > >> option. > >> > (aarch64_override_options_internal): Call parsing functions. > >> > (aarch64_operands_ok_for_ldpstp): Add option-value check and > >> > alignment check and remove superseded ones > >> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value > check > >> and > >> > alignment check and remove superseded ones. > >> > * config/aarch64/aarch64.opt: Add options. > >> > > >> > gcc/testsuite/ChangeLog: > >> > * gcc.target/aarch64/ldp_aligned.c: New test. > >> > * gcc.target/aarch64/ldp_always.c: New test. > >> > * gcc.target/aarch64/ldp_never.c: New test. > >> > * gcc.target/aarch64/stp_aligned.c: New test. > >> > * gcc.target/aarch64/stp_always.c: New test. > >> > * gcc.target/aarch64/stp_never.c: New test. > >> > > >> > Signed-off-by: Manos Anagnostakis <manos.anagnosta...@vrull.eu> > >> > --- > >> > Changes in v2: > >> > - Fixed commited ldp tests to correctly trigger > >> > and test aarch64_operands_adjust_ok_for_ldpstp in > aarch64.cc. > >> > - Added "-mcpu=generic" to commited tests to guarantee generic > >> target code > >> > generation and not cause the regressions of v1. > >> > > >> > gcc/config/aarch64/aarch64-protos.h | 24 ++ > >> > gcc/config/aarch64/aarch64-tuning-flags.def | 8 - > >> > gcc/config/aarch64/aarch64.cc | 229 > ++++++++++++++---- > >> > gcc/config/aarch64/aarch64.opt | 8 + > >> > .../gcc.target/aarch64/ldp_aligned.c | 66 +++++ > >> > gcc/testsuite/gcc.target/aarch64/ldp_always.c | 66 +++++ > >> > gcc/testsuite/gcc.target/aarch64/ldp_never.c | 66 +++++ > >> > .../gcc.target/aarch64/stp_aligned.c | 60 +++++ > >> > gcc/testsuite/gcc.target/aarch64/stp_always.c | 60 +++++ > >> > gcc/testsuite/gcc.target/aarch64/stp_never.c | 60 +++++ > >> > 10 files changed, 586 insertions(+), 61 deletions(-) > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c > >> > > >> > diff --git a/gcc/config/aarch64/aarch64-protos.h > >> b/gcc/config/aarch64/aarch64-protos.h > >> > index 70303d6fd95..be1d73490ed 100644 > >> > --- a/gcc/config/aarch64/aarch64-protos.h > >> > +++ b/gcc/config/aarch64/aarch64-protos.h > >> > @@ -568,6 +568,30 @@ struct tune_params > >> > /* Place prefetch struct pointer at the end to enable type checking > >> > errors when tune_params misses elements (e.g., from erroneous > >> merges). */ > >> > const struct cpu_prefetch_tune *prefetch; > >> > +/* An enum specifying how to handle load pairs using a fine-grained > >> policy: > >> > + - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned > >> > + to at least double the alignment of the type. > >> > + - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment. > >> > + - LDP_POLICY_NEVER: Do not emit ldp. */ > >> > + > >> > + enum aarch64_ldp_policy_model > >> > + { > >> > + LDP_POLICY_ALIGNED, > >> > + LDP_POLICY_ALWAYS, > >> > + LDP_POLICY_NEVER > >> > + } ldp_policy_model; > >> > +/* An enum specifying how to handle store pairs using a fine-grained > >> policy: > >> > + - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned > >> > + to at least double the alignment of the type. > >> > + - STP_POLICY_ALWAYS: Emit stp regardless of alignment. > >> > + - STP_POLICY_NEVER: Do not emit stp. */ > >> > + > >> > + enum aarch64_stp_policy_model > >> > + { > >> > + STP_POLICY_ALIGNED, > >> > + STP_POLICY_ALWAYS, > >> > + STP_POLICY_NEVER > >> > + } stp_policy_model; > >> > }; > >> > >> Generally the patch looks really good. But I think we can use a single > >> enum type for both LDP and STP, with the values having the prefix > >> AARCH&4_LDP_STP_POLICY. That means that we only need one parser, > >> and that: > >> > >> > /* Classifies an address. > >> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > >> b/gcc/config/aarch64/aarch64-tuning-flags.def > >> > index 52112ba7c48..774568e9106 100644 > >> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > >> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > >> > @@ -30,11 +30,6 @@ > >> > > >> > AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) > >> > > >> > -/* Don't create non-8 byte aligned load/store pair. That is if the > >> > -two load/stores are not at least 8 byte aligned don't create > load/store > >> > -pairs. */ > >> > -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", > SLOW_UNALIGNED_LDPW) > >> > - > >> > /* Some of the optional shift to some arthematic instructions are > >> > considered cheap. Logical shift left <=4 with or without a > >> > zero extend are considered cheap. Sign extend; non logical shift > >> left > >> > @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", > >> CHEAP_SHIFT_EXTEND) > >> > /* Disallow load/store pair instructions on Q-registers. */ > >> > AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS) > >> > > >> > -/* Disallow load-pair instructions to be formed in > combine/peephole. */ > >> > -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE) > >> > - > >> > AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS) > >> > > >> > AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", > >> CSE_SVE_VL_CONSTANTS) > >> > diff --git a/gcc/config/aarch64/aarch64.cc > >> b/gcc/config/aarch64/aarch64.cc > >> > index eba5d4a7e04..43d88c68647 100644 > >> > --- a/gcc/config/aarch64/aarch64.cc > >> > +++ b/gcc/config/aarch64/aarch64.cc > >> > @@ -1356,7 +1356,9 @@ static const struct tune_params generic_tunings > = > >> > Neoverse V1. It does not have a noticeable effect on A64FX and > >> should > >> > have at most a very minor effect on SVE2 cores. */ > >> > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params cortexa35_tunings = > >> > @@ -1390,7 +1392,9 @@ static const struct tune_params > cortexa35_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params cortexa53_tunings = > >> > @@ -1424,7 +1428,9 @@ static const struct tune_params > cortexa53_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params cortexa57_tunings = > >> > @@ -1458,7 +1464,9 @@ static const struct tune_params > cortexa57_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params cortexa72_tunings = > >> > @@ -1492,7 +1500,9 @@ static const struct tune_params > cortexa72_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params cortexa73_tunings = > >> > @@ -1526,7 +1536,9 @@ static const struct tune_params > cortexa73_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > > >> > @@ -1561,7 +1573,9 @@ static const struct tune_params > exynosm1_tunings = > >> > 48, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &exynosm1_prefetch_tune > >> > + &exynosm1_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params thunderxt88_tunings = > >> > @@ -1593,8 +1607,10 @@ static const struct tune_params > >> thunderxt88_tunings = > >> > 2, /* min_div_recip_mul_df. */ > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > >> > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW), /* tune_flags. */ > >> > - &thunderxt88_prefetch_tune > >> > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > + &thunderxt88_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params thunderx_tunings = > >> > @@ -1626,9 +1642,10 @@ static const struct tune_params > thunderx_tunings = > >> > 2, /* min_div_recip_mul_df. */ > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > >> > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW > >> > - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > >> > - &thunderx_prefetch_tune > >> > + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > >> > + &thunderx_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params tsv110_tunings = > >> > @@ -1662,7 +1679,9 @@ static const struct tune_params tsv110_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &tsv110_prefetch_tune > >> > + &tsv110_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params xgene1_tunings = > >> > @@ -1695,7 +1714,9 @@ static const struct tune_params xgene1_tunings = > >> > 17, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > >> > - &xgene1_prefetch_tune > >> > + &xgene1_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params emag_tunings = > >> > @@ -1728,7 +1749,9 @@ static const struct tune_params emag_tunings = > >> > 17, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > >> > - &xgene1_prefetch_tune > >> > + &xgene1_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params qdf24xx_tunings = > >> > @@ -1762,7 +1785,9 @@ static const struct tune_params qdf24xx_tunings > = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ > >> > - &qdf24xx_prefetch_tune > >> > + &qdf24xx_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > /* Tuning structure for the Qualcomm Saphira core. Default to falkor > >> values > >> > @@ -1798,7 +1823,9 @@ static const struct tune_params saphira_tunings > = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params thunderx2t99_tunings = > >> > @@ -1832,7 +1859,9 @@ static const struct tune_params > >> thunderx2t99_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &thunderx2t99_prefetch_tune > >> > + &thunderx2t99_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params thunderx3t110_tunings = > >> > @@ -1866,7 +1895,9 @@ static const struct tune_params > >> thunderx3t110_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &thunderx3t110_prefetch_tune > >> > + &thunderx3t110_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params neoversen1_tunings = > >> > @@ -1899,7 +1930,9 @@ static const struct tune_params > neoversen1_tunings > >> = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params ampere1_tunings = > >> > @@ -1935,8 +1968,10 @@ static const struct tune_params > ampere1_tunings = > >> > 2, /* min_div_recip_mul_df. */ > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > >> > - &ere1_prefetch_tune > >> > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > + &ere1_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params ampere1a_tunings = > >> > @@ -1973,8 +2008,10 @@ static const struct tune_params > ampere1a_tunings = > >> > 2, /* min_div_recip_mul_df. */ > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > >> > - &ere1_prefetch_tune > >> > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > + &ere1_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > >> > }; > >> > > >> > static const advsimd_vec_cost neoversev1_advsimd_vector_cost = > >> > @@ -2155,7 +2192,9 @@ static const struct tune_params > neoversev1_tunings > >> = > >> > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > >> > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > >> > | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const sve_vec_cost neoverse512tvb_sve_vector_cost = > >> > @@ -2292,7 +2331,9 @@ static const struct tune_params > >> neoverse512tvb_tunings = > >> > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > >> > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > >> > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. > */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const advsimd_vec_cost neoversen2_advsimd_vector_cost = > >> > @@ -2482,7 +2523,9 @@ static const struct tune_params > neoversen2_tunings > >> = > >> > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > >> > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > >> > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. > */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const advsimd_vec_cost neoversev2_advsimd_vector_cost = > >> > @@ -2672,7 +2715,9 @@ static const struct tune_params > neoversev2_tunings > >> = > >> > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > >> > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > >> > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. > */ > >> > - &generic_prefetch_tune > >> > + &generic_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > static const struct tune_params a64fx_tunings = > >> > @@ -2705,7 +2750,9 @@ static const struct tune_params a64fx_tunings = > >> > 0, /* max_case_values. */ > >> > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >> > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > >> > - &a64fx_prefetch_tune > >> > + &a64fx_prefetch_tune, > >> > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > >> > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > >> > }; > >> > > >> > /* Support for fine-grained override of the tuning structures. */ > >> > @@ -17726,6 +17773,50 @@ aarch64_parse_tune (const char *to_parse, > const > >> struct processor **res) > >> > return AARCH_PARSE_INVALID_ARG; > >> > } > >> > > >> > +/* Validate a command-line -mldp-policy option. Parse the policy > >> > + specified in STR and throw errors if appropriate. */ > >> > + > >> > +static bool > >> > +aarch64_parse_ldp_policy (const char *str, struct tune_params* tune) > >> > +{ > >> > + /* Check the value of the option to be one of the accepted. */ > >> > + if (strcmp (str, "always") == 0) > >> > + tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS; > >> > + else if (strcmp (str, "never") == 0) > >> > + tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER; > >> > + else if (strcmp (str, "aligned") == 0) > >> > + tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED; > >> > + else if (strcmp (str, "default") != 0) > >> > + { > >> > + error ("unknown value %qs for %<-mldp-policy%>", str); > >> > + return false; > >> > + } > >> > + > >> > + return true; > >> > +} > >> > + > >> > +/* Validate a command-line -mstp-policy option. Parse the policy > >> > + specified in STR and throw errors if appropriate. */ > >> > + > >> > +static bool > >> > +aarch64_parse_stp_policy (const char *str, struct tune_params* tune) > >> > +{ > >> > + /* Check the value of the option to be one of the accepted. */ > >> > + if (strcmp (str, "always") == 0) > >> > + tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS; > >> > + else if (strcmp (str, "never") == 0) > >> > + tune->stp_policy_model = tune_params::STP_POLICY_NEVER; > >> > + else if (strcmp (str, "aligned") == 0) > >> > + tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED; > >> > + else if (strcmp (str, "default") != 0) > >> > + { > >> > + error ("unknown value %qs for %<-mstp-policy%>", str); > >> > + return false; > >> > + } > >> > + > >> > + return true; > >> > +} > >> > + > >> > /* Parse TOKEN, which has length LENGTH to see if it is an option > >> > described in FLAG. If it is, return the index bit for that fusion > >> type. > >> > If not, error (printing OPTION_NAME) and return zero. */ > >> > @@ -18074,6 +18165,14 @@ aarch64_override_options_internal (struct > >> gcc_options *opts) > >> > aarch64_parse_override_string > (opts->x_aarch64_override_tune_string, > >> > &aarch64_tune_params); > >> > > >> > + if (opts->x_aarch64_ldp_policy_string) > >> > + aarch64_parse_ldp_policy (opts->x_aarch64_ldp_policy_string, > >> > + &aarch64_tune_params); > >> > + > >> > + if (opts->x_aarch64_stp_policy_string) > >> > + aarch64_parse_stp_policy (opts->x_aarch64_stp_policy_string, > >> > + &aarch64_tune_params); > >> > + > >> > /* This target defaults to strict volatile bitfields. */ > >> > if (opts->x_flag_strict_volatile_bitfields < 0 && > >> abi_version_at_least (2)) > >> > opts->x_flag_strict_volatile_bitfields = 1; > >> > @@ -26382,18 +26481,14 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, > >> bool load, > >> > enum reg_class rclass_1, rclass_2; > >> > rtx mem_1, mem_2, reg_1, reg_2; > >> > > >> > - /* Allow the tuning structure to disable LDP instruction formation > >> > - from combining instructions (e.g., in peephole2). > >> > - TODO: Implement fine-grained tuning control for LDP and STP: > >> > - 1. control policies for load and store separately; > >> > - 2. support the following policies: > >> > - - default (use what is in the tuning structure) > >> > - - always > >> > - - never > >> > - - aligned (only if the compiler can prove that the > >> > - load will be aligned to 2 * element_size) */ > >> > - if (load && (aarch64_tune_params.extra_tuning_flags > >> > - & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) > >> > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > >> > + if (load > >> > + && aarch64_tune_params.ldp_policy_model == > >> tune_params::LDP_POLICY_NEVER) > >> > + return false; > >> > + > >> > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > >> > + if (!load > >> > + && aarch64_tune_params.stp_policy_model == > >> tune_params::STP_POLICY_NEVER) > >> > return false; > >> > >> ...here we could do something like: > >> > >> auto policy = (load > >> ? aarch64_tune_params.ldp_policy_model > >> : aarch64_tune_params.stp_policy_model); > >> > >> Also: > >> > >> > > >> > if (load) > >> > @@ -26420,13 +26515,22 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, > >> bool load, > >> > if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) > >> > return false; > >> > > >> > - /* If we have SImode and slow unaligned ldp, > >> > - check the alignment to be at least 8 byte. */ > >> > - if (mode == SImode > >> > - && (aarch64_tune_params.extra_tuning_flags > >> > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > >> > + /* If we have LDP_POLICY_ALIGNED, > >> > + do not emit the load pair unless the alignment is checked to be > >> > + at least double the alignment of the type. */ > >> > + if (load > >> > + && aarch64_tune_params.ldp_policy_model == > >> tune_params::LDP_POLICY_ALIGNED > >> > && !optimize_size > >> > - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) > >> > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > >> > + return false; > >> > + > >> > + /* If we have STP_POLICY_ALIGNED, > >> > + do not emit the store pair unless the alignment is checked to be > >> > + at least double the alignment of the type. */ > >> > + if (!load > >> > + && aarch64_tune_params.stp_policy_model == > >> tune_params::STP_POLICY_ALIGNED > >> > + && !optimize_size > >> > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > >> > return false; > >> > > >> > /* Check if the addresses are in the form of [base+offset]. */ > >> > @@ -26556,6 +26660,16 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx > >> *operands, bool load, > >> > HOST_WIDE_INT offvals[num_insns], msize; > >> > rtx mem[num_insns], reg[num_insns], base[num_insns], > >> offset[num_insns]; > >> > > >> > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > >> > + if (load > >> > + && aarch64_tune_params.ldp_policy_model == > >> tune_params::LDP_POLICY_NEVER) > >> > + return false; > >> > + > >> > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > >> > + if (!load > >> > + && aarch64_tune_params.stp_policy_model == > >> tune_params::STP_POLICY_NEVER) > >> > + return false; > >> > + > >> > if (load) > >> > { > >> > for (int i = 0; i < num_insns; i++) > >> > @@ -26645,13 +26759,22 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx > >> *operands, bool load, > >> > if (offvals[0] % msize != offvals[2] % msize) > >> > return false; > >> > > >> > - /* If we have SImode and slow unaligned ldp, > >> > - check the alignment to be at least 8 byte. */ > >> > - if (mode == SImode > >> > - && (aarch64_tune_params.extra_tuning_flags > >> > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > >> > + /* If we have LDP_POLICY_ALIGNED, > >> > + do not emit the load pair unless the alignment is checked to be > >> > + at least double the alignment of the type. */ > >> > + if (load > >> > + && aarch64_tune_params.ldp_policy_model == > >> tune_params::LDP_POLICY_ALIGNED > >> > + && !optimize_size > >> > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode)) > >> > + return false; > >> > + > >> > + /* If we have STP_POLICY_ALIGNED, > >> > + do not emit the store pair unless the alignment is checked to be > >> > + at least double the alignment of the type. */ > >> > + if (!load > >> > + && aarch64_tune_params.stp_policy_model == > >> tune_params::STP_POLICY_ALIGNED > >> > && !optimize_size > >> > - && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT) > >> > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode)) > >> > return false; > >> > > >> > return true; > >> > diff --git a/gcc/config/aarch64/aarch64.opt > >> b/gcc/config/aarch64/aarch64.opt > >> > index 4a0580435a8..e5302947ce7 100644 > >> > --- a/gcc/config/aarch64/aarch64.opt > >> > +++ b/gcc/config/aarch64/aarch64.opt > >> > @@ -205,6 +205,14 @@ msign-return-address= > >> > Target WarnRemoved RejectNegative Joined Enum(aarch_ra_sign_scope_t) > >> Var(aarch_ra_sign_scope) Init(AARCH_FUNCTION_NONE) Save > >> > Select return address signing scope. > >> > > >> > +mldp-policy= > >> > +Target RejectNegative Joined Var(aarch64_ldp_policy_string) Save > >> > +Fine-grained policy for load pairs. > >> > + > >> > +mstp-policy= > >> > +Target RejectNegative Joined Var(aarch64_stp_policy_string) Save > >> > +Fine-grained policy for store pairs. > >> > + > >> > Enum > >> > Name(aarch_ra_sign_scope_t) Type(enum aarch_function_type) > >> > Supported AArch64 return address signing scope (for use with > >> -msign-return-address= option): > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > >> b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > >> > new file mode 100644 > >> > index 00000000000..6e29b265168 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > >> > @@ -0,0 +1,66 @@ > >> > +/* { dg-options "-O2 -mldp-policy=aligned -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define LDP_TEST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[0]; \ > >> > + a_1 = arr[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[0]; \ > >> > + a_1 = a[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[100]; \ > >> > + a_1 = arr[101]; \ > >> > + a_2 = arr[102]; \ > >> > + a_3 = arr[103]; \ > >> > + a_4 = arr[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[100]; \ > >> > + a_1 = a[101]; \ > >> > + a_2 = a[102]; \ > >> > + a_3 = a[103]; \ > >> > + a_4 = a[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +LDP_TEST_ALIGNED(int32_t); > >> > +LDP_TEST_ALIGNED(int64_t); > >> > +LDP_TEST_ALIGNED(v4si); > >> > +LDP_TEST_UNALIGNED(int32_t); > >> > +LDP_TEST_UNALIGNED(int64_t); > >> > +LDP_TEST_UNALIGNED(v4si); > >> > +LDP_TEST_ADJUST_ALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_ALIGNED(int64_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } > } */ > >> > >> It might be better to split this into two tests, one for the aligned > >> accesses and one for the unaligned accesses. Same for the store > version. > >> (Splitting isn't necessary or useful for =always and =never though.) > >> > >> Thanks, > >> Richard > >> > >> > + > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c > >> b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > >> > new file mode 100644 > >> > index 00000000000..d2c4cf343e9 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > >> > @@ -0,0 +1,66 @@ > >> > +/* { dg-options "-O2 -mldp-policy=always -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define LDP_TEST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[0]; \ > >> > + a_1 = arr[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[0]; \ > >> > + a_1 = a[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[100]; \ > >> > + a_1 = arr[101]; \ > >> > + a_2 = arr[102]; \ > >> > + a_3 = arr[103]; \ > >> > + a_4 = arr[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[100]; \ > >> > + a_1 = a[101]; \ > >> > + a_2 = a[102]; \ > >> > + a_3 = a[103]; \ > >> > + a_4 = a[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +LDP_TEST_ALIGNED(int32_t); > >> > +LDP_TEST_ALIGNED(int64_t); > >> > +LDP_TEST_ALIGNED(v4si); > >> > +LDP_TEST_UNALIGNED(int32_t); > >> > +LDP_TEST_UNALIGNED(int64_t); > >> > +LDP_TEST_UNALIGNED(v4si); > >> > +LDP_TEST_ADJUST_ALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_ALIGNED(int64_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } > } */ > >> > + > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c > >> b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > >> > new file mode 100644 > >> > index 00000000000..f8a45ee18be > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > >> > @@ -0,0 +1,66 @@ > >> > +/* { dg-options "-O2 -mldp-policy=never -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define LDP_TEST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[0]; \ > >> > + a_1 = arr[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[0]; \ > >> > + a_1 = a[1]; \ > >> > + return a_0 + a_1; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + a_0 = arr[100]; \ > >> > + a_1 = arr[101]; \ > >> > + a_2 = arr[102]; \ > >> > + a_3 = arr[103]; \ > >> > + a_4 = arr[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > >> > + TYPE a_0, a_1, a_2, a_3, a_4; \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a_0 = a[100]; \ > >> > + a_1 = a[101]; \ > >> > + a_2 = a[102]; \ > >> > + a_3 = a[103]; \ > >> > + a_4 = a[110]; \ > >> > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > >> > +} > >> > + > >> > +LDP_TEST_ALIGNED(int32_t); > >> > +LDP_TEST_ALIGNED(int64_t); > >> > +LDP_TEST_ALIGNED(v4si); > >> > +LDP_TEST_UNALIGNED(int32_t); > >> > +LDP_TEST_UNALIGNED(int64_t); > >> > +LDP_TEST_UNALIGNED(v4si); > >> > +LDP_TEST_ADJUST_ALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_ALIGNED(int64_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } > } */ > >> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } > } */ > >> > + > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > >> b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > >> > new file mode 100644 > >> > index 00000000000..ae47b42efc4 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > >> > @@ -0,0 +1,60 @@ > >> > +/* { dg-options "-O2 -mstp-policy=aligned -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define STP_TEST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[0] = x; \ > >> > + arr[1] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[0] = x; \ > >> > + a[1] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[100] = x; \ > >> > + arr[101] = x; \ > >> > + arr[102] = x; \ > >> > + arr[103] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[100] = x; \ > >> > + a[101] = x; \ > >> > + a[102] = x; \ > >> > + a[103] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +STP_TEST_ALIGNED(int32_t); > >> > +STP_TEST_ALIGNED(int64_t); > >> > +STP_TEST_ALIGNED(v4si); > >> > +STP_TEST_UNALIGNED(int32_t); > >> > +STP_TEST_UNALIGNED(int64_t); > >> > +STP_TEST_UNALIGNED(v4si); > >> > +STP_TEST_ADJUST_ALIGNED(int32_t); > >> > +STP_TEST_ADJUST_ALIGNED(int64_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } > } */ > >> > + > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c > >> b/gcc/testsuite/gcc.target/aarch64/stp_always.c > >> > new file mode 100644 > >> > index 00000000000..c1c51f9ae88 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c > >> > @@ -0,0 +1,60 @@ > >> > +/* { dg-options "-O2 -mstp-policy=always -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define STP_TEST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[0] = x; \ > >> > + arr[1] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[0] = x; \ > >> > + a[1] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[100] = x; \ > >> > + arr[101] = x; \ > >> > + arr[102] = x; \ > >> > + arr[103] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[100] = x; \ > >> > + a[101] = x; \ > >> > + a[102] = x; \ > >> > + a[103] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +STP_TEST_ALIGNED(int32_t); > >> > +STP_TEST_ALIGNED(int64_t); > >> > +STP_TEST_ALIGNED(v4si); > >> > +STP_TEST_UNALIGNED(int32_t); > >> > +STP_TEST_UNALIGNED(int64_t); > >> > +STP_TEST_UNALIGNED(v4si); > >> > +STP_TEST_ADJUST_ALIGNED(int32_t); > >> > +STP_TEST_ADJUST_ALIGNED(int64_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } > } */ > >> > + > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c > >> b/gcc/testsuite/gcc.target/aarch64/stp_never.c > >> > new file mode 100644 > >> > index 00000000000..c28fcafa0ed > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c > >> > @@ -0,0 +1,60 @@ > >> > +/* { dg-options "-O2 -mstp-policy=never -mcpu=generic" } */ > >> > + > >> > +#include <stdlib.h> > >> > +#include <stdint.h> > >> > + > >> > +typedef int v4si __attribute__ ((vector_size (16))); > >> > + > >> > +#define STP_TEST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[0] = x; \ > >> > + arr[1] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[0] = x; \ > >> > + a[1] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > >> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + arr[100] = x; \ > >> > + arr[101] = x; \ > >> > + arr[102] = x; \ > >> > + arr[103] = x; \ > >> > + return arr; \ > >> > +} > >> > + > >> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > >> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > >> > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > >> 1)); \ > >> > + TYPE *a = arr+1; \ > >> > + a[100] = x; \ > >> > + a[101] = x; \ > >> > + a[102] = x; \ > >> > + a[103] = x; \ > >> > + return a; \ > >> > +} > >> > + > >> > +STP_TEST_ALIGNED(int32_t); > >> > +STP_TEST_ALIGNED(int64_t); > >> > +STP_TEST_ALIGNED(v4si); > >> > +STP_TEST_UNALIGNED(int32_t); > >> > +STP_TEST_UNALIGNED(int64_t); > >> > +STP_TEST_UNALIGNED(v4si); > >> > +STP_TEST_ADJUST_ALIGNED(int32_t); > >> > +STP_TEST_ADJUST_ALIGNED(int64_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int32_t); > >> > +STP_TEST_ADJUST_UNALIGNED(int64_t); > >> > + > >> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } > } */ > >> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } > } */ > >> > + > >> > -- *Manos Anagnostakis | Compiler Engineer |* E: manos.anagnosta...@vrull.eu <makeljana.shku...@vrull.eu> *VRULL GmbH *| Beatrixgasse 32 1030 Vienna | W: www.vrull.eu | LinkedIn <https://www.linkedin.com/company/vrull/>