> -----Original Message----- > From: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Sent: Tuesday, September 26, 2023 9:36 AM > To: Manos Anagnostakis <manos.anagnosta...@vrull.eu>; gcc- > patc...@gcc.gnu.org > Cc: Philipp Tomsich <philipp.toms...@vrull.eu>; Andrew Pinski > <pins...@gmail.com> > Subject: RE: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp > formation. > > Hi Manos, > > Thank you for the quick turnaround, please post the patch that uses a -- > param with an enum. I think that's the direction we should be going with this > patch.
Ah, and please address Tamar's feedback from https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631343.html Thanks, Kyrill > > From: Manos Anagnostakis <manos.anagnosta...@vrull.eu> > Sent: Tuesday, September 26, 2023 7:06 AM > To: gcc-patches@gcc.gnu.org > Cc: Philipp Tomsich <philipp.toms...@vrull.eu>; Kyrylo Tkachov > <kyrylo.tkac...@arm.com>; Andrew Pinski <pins...@gmail.com> > Subject: Re: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp > formation. > > Thank you Andrew for the input. > > I've prepared a patch using --param with enum, which seems a more suitable > approach to me as strings are more descriptive as well. > > The current patch needed an adjustment on how to call the parsing functions > to match the compiler coding style. > > Both are bootstrapped and regstested. > > I can send a V4 of whichever is preferred. > > Thanks! > > Manos. > > On Mon, Sep 25, 2023 at 11:57 PM Andrew Pinski > <mailto:pins...@gmail.com> wrote: > On Mon, Sep 25, 2023 at 1:04 PM Andrew Pinski <mailto:pins...@gmail.com> > wrote: > > > > On Mon, Sep 25, 2023 at 12:59 PM Philipp Tomsich > > <mailto:philipp.toms...@vrull.eu> wrote: > > > > > > On Mon, 25 Sept 2023 at 21:54, Andrew Pinski > <mailto:pins...@gmail.com> wrote: > > > > > > > > On Mon, Sep 25, 2023 at 12:50 PM Manos Anagnostakis > > > > <mailto:manos.anagnosta...@vrull.eu> wrote: > > > > > > > > > > This patch implements the following TODO in > gcc/config/aarch64/aarch64.cc > > > > > to provide the requested behaviour for handling ldp and stp: > > > > > > > > > > /* Allow the tuning structure to disable LDP instruction formation > > > > > from combining instructions (e.g., in peephole2). > > > > > TODO: Implement fine-grained tuning control for LDP and STP: > > > > > 1. control policies for load and store separately; > > > > > 2. support the following policies: > > > > > - default (use what is in the tuning structure) > > > > > - always > > > > > - never > > > > > - aligned (only if the compiler can prove that the > > > > > load will be aligned to 2 * element_size) */ > > > > > > > > > > It provides two new and concrete target-specific command-line > parameters > > > > > -param=aarch64-ldp-policy= and -param=aarch64-stp-policy= > > > > > to give the ability to control load and store policies seperately as > > > > > stated in part 1 of the TODO. > > > > > > > > > > The accepted values for both parameters are: > > > > > - 0: Use the policy of the tuning structure (default). > > > > > - 1: Emit ldp/stp regardless of alignment. > > > > > - 2: Do not emit ldp/stp. > > > > > - 3: In order to emit ldp/stp, first check if the load/store will > > > > > be aligned to 2 * element_size. > > > > > > > > Instead of a number, does it make sense to instead use an string > > > > (ENUM) for this param. > > > > Also I think using --param is a bad idea if it is going to be > > > > documented in the user manual. > > > > Maybe a -m option should be used instead. > > > > > > See https://gcc.gnu.org/pipermail/gcc-patches/2023- > September/631283.html > > > for the discussion triggering the change from -m... to --param and the > > > change to using a number instead of a string. > > > > That is the opposite of the current GCC practice across all targets. > > Things like this should be consistent and if one target decides to do > > it different, then maybe it should NOT. > > Anyways we should document the correct coding style for options so we > > don't have these back and forths again. > > Kyrylo: > > It will have to take a number rather than a string but that should be > >okay, as > long as the right values are documented in invoke.texi. > > No it does not need to be a number. --param=ranger-debug= does not > take a number, it takes an enum . > One of the benefits of moving --param support over to .opt to allow > more than just numbers even. > > Thanks, > Andrew > > > > > > > > Thanks, > > Andrew > > > > > > > > Thanks, > > > Philipp. > > > > > > > > > > > Thanks, > > > > Andrew > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > * config/aarch64/aarch64-protos.h (struct tune_params): Add > > > > > appropriate enums for the policies. > > > > > * config/aarch64/aarch64-tuning-flags.def > > > > > (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning > > > > > options. > > > > > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New > > > > > function to parse ldp-policy parameter. > > > > > (aarch64_parse_stp_policy): New function to parse stp-policy > parameter. > > > > > (aarch64_override_options_internal): Call parsing functions. > > > > > (aarch64_operands_ok_for_ldpstp): Add parameter-value check > and > > > > > alignment check and remove superseded ones. > > > > > (aarch64_operands_adjust_ok_for_ldpstp): Add parameter-value > check and > > > > > alignment check and remove superseded ones. > > > > > * config/aarch64/aarch64.opt: Add options. > > > > > * doc/invoke.texi: Document the parameters accordingly. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > * gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed. > > > > > * gcc.target/aarch64/ldp_aligned.c: New test. > > > > > * gcc.target/aarch64/ldp_always.c: New test. > > > > > * gcc.target/aarch64/ldp_never.c: New test. > > > > > * gcc.target/aarch64/stp_aligned.c: New test. > > > > > * gcc.target/aarch64/stp_always.c: New test. > > > > > * gcc.target/aarch64/stp_never.c: New test. > > > > > > > > > > Signed-off-by: Manos Anagnostakis > <mailto:manos.anagnosta...@vrull.eu> > > > > > --- > > > > > Changes in v3: > > > > > - Changed command-line options to target-specific parameters > > > > > and documented them accordingly in doc/invoke.texi. > > > > > - Removed ampere1-no_ldp_combine.c test as superseded. > > > > > > > > > > gcc/config/aarch64/aarch64-protos.h | 24 ++ > > > > > gcc/config/aarch64/aarch64-tuning-flags.def | 8 - > > > > > gcc/config/aarch64/aarch64.cc | 215 > > > > >+++++++++++++----- > > > > > gcc/config/aarch64/aarch64.opt | 8 + > > > > > gcc/doc/invoke.texi | 30 +++ > > > > > .../aarch64/ampere1-no_ldp_combine.c | 11 - > > > > > .../gcc.target/aarch64/ldp_aligned.c | 66 ++++++ > > > > > gcc/testsuite/gcc.target/aarch64/ldp_always.c | 66 ++++++ > > > > > gcc/testsuite/gcc.target/aarch64/ldp_never.c | 66 ++++++ > > > > > .../gcc.target/aarch64/stp_aligned.c | 60 +++++ > > > > > gcc/testsuite/gcc.target/aarch64/stp_always.c | 60 +++++ > > > > > gcc/testsuite/gcc.target/aarch64/stp_never.c | 60 +++++ > > > > > 12 files changed, 600 insertions(+), 74 deletions(-) > > > > > delete mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1- > no_ldp_combine.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c > > > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c > > > > > > > > > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > > > > > index 70303d6fd95..be1d73490ed 100644 > > > > > --- a/gcc/config/aarch64/aarch64-protos.h > > > > > +++ b/gcc/config/aarch64/aarch64-protos.h > > > > > @@ -568,6 +568,30 @@ struct tune_params > > > > > /* Place prefetch struct pointer at the end to enable type checking > > > > > errors when tune_params misses elements (e.g., from erroneous > merges). */ > > > > > const struct cpu_prefetch_tune *prefetch; > > > > > +/* An enum specifying how to handle load pairs using a fine-grained > policy: > > > > > + - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned > > > > > + to at least double the alignment of the type. > > > > > + - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment. > > > > > + - LDP_POLICY_NEVER: Do not emit ldp. */ > > > > > + > > > > > + enum aarch64_ldp_policy_model > > > > > + { > > > > > + LDP_POLICY_ALIGNED, > > > > > + LDP_POLICY_ALWAYS, > > > > > + LDP_POLICY_NEVER > > > > > + } ldp_policy_model; > > > > > +/* An enum specifying how to handle store pairs using a fine-grained > policy: > > > > > + - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned > > > > > + to at least double the alignment of the type. > > > > > + - STP_POLICY_ALWAYS: Emit stp regardless of alignment. > > > > > + - STP_POLICY_NEVER: Do not emit stp. */ > > > > > + > > > > > + enum aarch64_stp_policy_model > > > > > + { > > > > > + STP_POLICY_ALIGNED, > > > > > + STP_POLICY_ALWAYS, > > > > > + STP_POLICY_NEVER > > > > > + } stp_policy_model; > > > > > }; > > > > > > > > > > /* Classifies an address. > > > > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def > b/gcc/config/aarch64/aarch64-tuning-flags.def > > > > > index 52112ba7c48..774568e9106 100644 > > > > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def > > > > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def > > > > > @@ -30,11 +30,6 @@ > > > > > > > > > > AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", > RENAME_FMA_REGS) > > > > > > > > > > -/* Don't create non-8 byte aligned load/store pair. That is if the > > > > > -two load/stores are not at least 8 byte aligned don't create > > > > > load/store > > > > > -pairs. */ > > > > > -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", > SLOW_UNALIGNED_LDPW) > > > > > - > > > > > /* Some of the optional shift to some arthematic instructions are > > > > > considered cheap. Logical shift left <=4 with or without a > > > > > zero extend are considered cheap. Sign extend; non logical shift > > > > >left > > > > > @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) > > > > > /* Disallow load/store pair instructions on Q-registers. */ > > > > > AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", > NO_LDP_STP_QREGS) > > > > > > > > > > -/* Disallow load-pair instructions to be formed in > combine/peephole. */ > > > > > -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", > NO_LDP_COMBINE) > > > > > - > > > > > AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", > RENAME_LOAD_REGS) > > > > > > > > > > AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", > CSE_SVE_VL_CONSTANTS) > > > > > diff --git a/gcc/config/aarch64/aarch64.cc > b/gcc/config/aarch64/aarch64.cc > > > > > index 219c4ee6d4c..9eeb5469cf9 100644 > > > > > --- a/gcc/config/aarch64/aarch64.cc > > > > > +++ b/gcc/config/aarch64/aarch64.cc > > > > > @@ -1357,7 +1357,9 @@ static const struct tune_params > generic_tunings = > > > > > Neoverse V1. It does not have a noticeable effect on A64FX and > should > > > > > have at most a very minor effect on SVE2 cores. */ > > > > > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params cortexa35_tunings = > > > > > @@ -1391,7 +1393,9 @@ static const struct tune_params > cortexa35_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params cortexa53_tunings = > > > > > @@ -1425,7 +1429,9 @@ static const struct tune_params > cortexa53_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params cortexa57_tunings = > > > > > @@ -1459,7 +1465,9 @@ static const struct tune_params > cortexa57_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params cortexa72_tunings = > > > > > @@ -1493,7 +1501,9 @@ static const struct tune_params > cortexa72_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params cortexa73_tunings = > > > > > @@ -1527,7 +1537,9 @@ static const struct tune_params > cortexa73_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > > > > > > @@ -1562,7 +1574,9 @@ static const struct tune_params > exynosm1_tunings = > > > > > 48, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &exynosm1_prefetch_tune > > > > > + &exynosm1_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params thunderxt88_tunings = > > > > > @@ -1594,8 +1608,10 @@ static const struct tune_params > thunderxt88_tunings = > > > > > 2, /* min_div_recip_mul_df. */ > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_OFF, /* > autoprefetcher_model. */ > > > > > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW), /* > tune_flags. */ > > > > > - &thunderxt88_prefetch_tune > > > > > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > + &thunderxt88_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params thunderx_tunings = > > > > > @@ -1627,9 +1643,10 @@ static const struct tune_params > thunderx_tunings = > > > > > 2, /* min_div_recip_mul_df. */ > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_OFF, /* > autoprefetcher_model. */ > > > > > - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW > > > > > - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* > tune_flags. */ > > > > > - &thunderx_prefetch_tune > > > > > + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* > tune_flags. */ > > > > > + &thunderx_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params tsv110_tunings = > > > > > @@ -1663,7 +1680,9 @@ static const struct tune_params > tsv110_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &tsv110_prefetch_tune > > > > > + &tsv110_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params xgene1_tunings = > > > > > @@ -1696,7 +1715,9 @@ static const struct tune_params > xgene1_tunings = > > > > > 17, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_OFF, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > > > > > - &xgene1_prefetch_tune > > > > > + &xgene1_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params emag_tunings = > > > > > @@ -1729,7 +1750,9 @@ static const struct tune_params > emag_tunings = > > > > > 17, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_OFF, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ > > > > > - &xgene1_prefetch_tune > > > > > + &xgene1_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params qdf24xx_tunings = > > > > > @@ -1763,7 +1786,9 @@ static const struct tune_params > qdf24xx_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ > > > > > - &qdf24xx_prefetch_tune > > > > > + &qdf24xx_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > /* Tuning structure for the Qualcomm Saphira core. Default to falkor > values > > > > > @@ -1799,7 +1824,9 @@ static const struct tune_params > saphira_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params thunderx2t99_tunings = > > > > > @@ -1833,7 +1860,9 @@ static const struct tune_params > thunderx2t99_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &thunderx2t99_prefetch_tune > > > > > + &thunderx2t99_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params thunderx3t110_tunings = > > > > > @@ -1867,7 +1896,9 @@ static const struct tune_params > thunderx3t110_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &thunderx3t110_prefetch_tune > > > > > + &thunderx3t110_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params neoversen1_tunings = > > > > > @@ -1900,7 +1931,9 @@ static const struct tune_params > neoversen1_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params ampere1_tunings = > > > > > @@ -1936,8 +1969,10 @@ static const struct tune_params > ampere1_tunings = > > > > > 2, /* min_div_recip_mul_df. */ > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > > > > > - &ere1_prefetch_tune > > > > > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > + &ere1_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params ampere1a_tunings = > > > > > @@ -1974,8 +2009,10 @@ static const struct tune_params > ampere1a_tunings = > > > > > 2, /* min_div_recip_mul_df. */ > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ > > > > > - &ere1_prefetch_tune > > > > > + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > + &ere1_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const advsimd_vec_cost neoversev1_advsimd_vector_cost = > > > > > @@ -2156,7 +2193,9 @@ static const struct tune_params > neoversev1_tunings = > > > > > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > > > > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > > > > > | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const sve_vec_cost neoverse512tvb_sve_vector_cost = > > > > > @@ -2293,7 +2332,9 @@ static const struct tune_params > neoverse512tvb_tunings = > > > > > (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > > > > > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > > > > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const advsimd_vec_cost neoversen2_advsimd_vector_cost = > > > > > @@ -2483,7 +2524,9 @@ static const struct tune_params > neoversen2_tunings = > > > > > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > > > > > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > > > > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const advsimd_vec_cost neoversev2_advsimd_vector_cost = > > > > > @@ -2673,7 +2716,9 @@ static const struct tune_params > neoversev2_tunings = > > > > > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > > > > > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > > > > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* > tune_flags. */ > > > > > - &generic_prefetch_tune > > > > > + &generic_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > static const struct tune_params a64fx_tunings = > > > > > @@ -2706,7 +2751,9 @@ static const struct tune_params > a64fx_tunings = > > > > > 0, /* max_case_values. */ > > > > > tune_params::AUTOPREFETCHER_WEAK, /* > autoprefetcher_model. */ > > > > > (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ > > > > > - &a64fx_prefetch_tune > > > > > + &a64fx_prefetch_tune, > > > > > + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ > > > > > + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ > > > > > }; > > > > > > > > > > /* Support for fine-grained override of the tuning structures. */ > > > > > @@ -17819,6 +17866,34 @@ aarch64_parse_tune (const char > *to_parse, const struct processor **res) > > > > > return AARCH_PARSE_INVALID_ARG; > > > > > } > > > > > > > > > > +/* Parse a command-line -param=aarch64-ldp-policy= > parameter. VALUE is > > > > > + the value of the parameter. */ > > > > > + > > > > > +static void > > > > > +aarch64_parse_ldp_policy (const unsigned int value, struct > tune_params* tune) > > > > > +{ > > > > > + if (value == 1) > > > > > + tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS; > > > > > + else if (value == 2) > > > > > + tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER; > > > > > + else if (value == 3) > > > > > + tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED; > > > > > +} > > > > > + > > > > > +/* Parse a command-line -param=aarch64-stp-policy= > parameter. VALUE is > > > > > + the value of the parameter. */ > > > > > + > > > > > +static void > > > > > +aarch64_parse_stp_policy (const unsigned int value, struct > tune_params* tune) > > > > > +{ > > > > > + if (value == 1) > > > > > + tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS; > > > > > + else if (value == 2) > > > > > + tune->stp_policy_model = tune_params::STP_POLICY_NEVER; > > > > > + else if (value == 3) > > > > > + tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED; > > > > > +} > > > > > + > > > > > /* Parse TOKEN, which has length LENGTH to see if it is an option > > > > > described in FLAG. If it is, return the index bit for that > > > > >fusion type. > > > > > If not, error (printing OPTION_NAME) and return zero. */ > > > > > @@ -18167,6 +18242,12 @@ aarch64_override_options_internal > (struct gcc_options *opts) > > > > > aarch64_parse_override_string (opts- > >x_aarch64_override_tune_string, > > > > > &aarch64_tune_params); > > > > > > > > > > + aarch64_parse_ldp_policy (aarch64_ldp_policy, > > > > > + &aarch64_tune_params); > > > > > + > > > > > + aarch64_parse_stp_policy (aarch64_stp_policy, > > > > > + &aarch64_tune_params); > > > > > + > > > > > /* This target defaults to strict volatile bitfields. */ > > > > > if (opts->x_flag_strict_volatile_bitfields < 0 && > > > > >abi_version_at_least > (2)) > > > > > opts->x_flag_strict_volatile_bitfields = 1; > > > > > @@ -26468,18 +26549,14 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, bool load, > > > > > enum reg_class rclass_1, rclass_2; > > > > > rtx mem_1, mem_2, reg_1, reg_2; > > > > > > > > > > - /* Allow the tuning structure to disable LDP instruction formation > > > > > - from combining instructions (e.g., in peephole2). > > > > > - TODO: Implement fine-grained tuning control for LDP and STP: > > > > > - 1. control policies for load and store separately; > > > > > - 2. support the following policies: > > > > > - - default (use what is in the tuning structure) > > > > > - - always > > > > > - - never > > > > > - - aligned (only if the compiler can prove that the > > > > > - load will be aligned to 2 * element_size) */ > > > > > - if (load && (aarch64_tune_params.extra_tuning_flags > > > > > - & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) > > > > > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > > > > > + if (load > > > > > + && aarch64_tune_params.ldp_policy_model == > tune_params::LDP_POLICY_NEVER) > > > > > + return false; > > > > > + > > > > > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > > > > > + if (!load > > > > > + && aarch64_tune_params.stp_policy_model == > tune_params::STP_POLICY_NEVER) > > > > > return false; > > > > > > > > > > if (load) > > > > > @@ -26506,13 +26583,22 @@ aarch64_operands_ok_for_ldpstp (rtx > *operands, bool load, > > > > > if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) > > > > > return false; > > > > > > > > > > - /* If we have SImode and slow unaligned ldp, > > > > > - check the alignment to be at least 8 byte. */ > > > > > - if (mode == SImode > > > > > - && (aarch64_tune_params.extra_tuning_flags > > > > > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > > > > > - && !optimize_size > > > > > - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) > > > > > + /* If we have LDP_POLICY_ALIGNED, > > > > > + do not emit the load pair unless the alignment is checked to be > > > > > + at least double the alignment of the type. */ > > > > > + if (load > > > > > + && aarch64_tune_params.ldp_policy_model == > tune_params::LDP_POLICY_ALIGNED > > > > > + && !optimize_function_for_size_p (cfun) > > > > > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > > > > > + return false; > > > > > + > > > > > + /* If we have STP_POLICY_ALIGNED, > > > > > + do not emit the store pair unless the alignment is checked to be > > > > > + at least double the alignment of the type. */ > > > > > + if (!load > > > > > + && aarch64_tune_params.stp_policy_model == > tune_params::STP_POLICY_ALIGNED > > > > > + && !optimize_function_for_size_p (cfun) > > > > > + && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode)) > > > > > return false; > > > > > > > > > > /* Check if the addresses are in the form of [base+offset]. */ > > > > > @@ -26640,6 +26726,16 @@ > aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load, > > > > > HOST_WIDE_INT offvals[num_insns], msize; > > > > > rtx mem[num_insns], reg[num_insns], base[num_insns], > offset[num_insns]; > > > > > > > > > > + /* If we have LDP_POLICY_NEVER, reject the load pair. */ > > > > > + if (load > > > > > + && aarch64_tune_params.ldp_policy_model == > tune_params::LDP_POLICY_NEVER) > > > > > + return false; > > > > > + > > > > > + /* If we have STP_POLICY_NEVER, reject the store pair. */ > > > > > + if (!load > > > > > + && aarch64_tune_params.stp_policy_model == > tune_params::STP_POLICY_NEVER) > > > > > + return false; > > > > > + > > > > > if (load) > > > > > { > > > > > for (int i = 0; i < num_insns; i++) > > > > > @@ -26729,13 +26825,22 @@ > aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load, > > > > > if (offvals[0] % msize != offvals[2] % msize) > > > > > return false; > > > > > > > > > > - /* If we have SImode and slow unaligned ldp, > > > > > - check the alignment to be at least 8 byte. */ > > > > > - if (mode == SImode > > > > > - && (aarch64_tune_params.extra_tuning_flags > > > > > - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) > > > > > - && !optimize_size > > > > > - && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT) > > > > > + /* If we have LDP_POLICY_ALIGNED, > > > > > + do not emit the load pair unless the alignment is checked to be > > > > > + at least double the alignment of the type. */ > > > > > + if (load > > > > > + && aarch64_tune_params.ldp_policy_model == > tune_params::LDP_POLICY_ALIGNED > > > > > + && !optimize_function_for_size_p (cfun) > > > > > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT > (mode)) > > > > > + return false; > > > > > + > > > > > + /* If we have STP_POLICY_ALIGNED, > > > > > + do not emit the store pair unless the alignment is checked to be > > > > > + at least double the alignment of the type. */ > > > > > + if (!load > > > > > + && aarch64_tune_params.stp_policy_model == > tune_params::STP_POLICY_ALIGNED > > > > > + && !optimize_function_for_size_p (cfun) > > > > > + && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT > (mode)) > > > > > return false; > > > > > > > > > > return true; > > > > > diff --git a/gcc/config/aarch64/aarch64.opt > b/gcc/config/aarch64/aarch64.opt > > > > > index 4a0580435a8..f61e3f968d4 100644 > > > > > --- a/gcc/config/aarch64/aarch64.opt > > > > > +++ b/gcc/config/aarch64/aarch64.opt > > > > > @@ -337,3 +337,11 @@ Constant memset size in bytes from which to > start using MOPS sequence. > > > > > -param=aarch64-vect-unroll-limit= > > > > > Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param > > > > > Limit how much the autovectorizer may unroll a loop. > > > > > + > > > > > +-param=aarch64-ldp-policy= > > > > > +Target Joined UInteger Var(aarch64_ldp_policy) Init(0) > IntegerRange(0, 3) Param > > > > > +Fine-grained policy for load pairs. > > > > > + > > > > > +-param=aarch64-stp-policy= > > > > > +Target Joined UInteger Var(aarch64_stp_policy) Init(0) > IntegerRange(0, 3) Param > > > > > +Fine-grained policy for store pairs. > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > > > index 146b40414b0..000dd5541f4 100644 > > > > > --- a/gcc/doc/invoke.texi > > > > > +++ b/gcc/doc/invoke.texi > > > > > @@ -16508,6 +16508,36 @@ Use both Advanced SIMD and > SVE. Prefer SVE when the costs are deemed equal. > > > > > @end table > > > > > The default value is 0. > > > > > > > > > > +@item aarch64-ldp-policy > > > > > +Fine-grained policy for load pairs. Accepts values from 0 to 3, > inclusive. > > > > > +@table @samp > > > > > +@item 0 > > > > > +Use the policy of the tuning structure. > > > > > +@item 1 > > > > > +Emit ldp regardless of alignment. > > > > > +@item 2 > > > > > +Do not emit ldp. > > > > > +@item 3 > > > > > +Emit ldp only if the source pointer is aligned to at least double the > alignment > > > > > +of the type. > > > > > +@end table > > > > > +The default value is 0. > > > > > + > > > > > +@item aarch64-stp-policy > > > > > +Fine-grained policy for store pairs. Accepts values from 0 to 3, > inclusive. > > > > > +@table @samp > > > > > +@item 0 > > > > > +Use the policy of the tuning structure. > > > > > +@item 1 > > > > > +Emit stp regardless of alignment. > > > > > +@item 2 > > > > > +Do not emit stp. > > > > > +@item 3 > > > > > +Emit stp only if the source pointer is aligned to at least double the > alignment > > > > > +of the type. > > > > > +@end table > > > > > +The default value is 0. > > > > > + > > > > > @item aarch64-loop-vect-issue-rate-niters > > > > > The tuning for some AArch64 CPUs tries to take both latencies and > issue > > > > > rates into account when deciding whether a loop should be vectorized > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1- > no_ldp_combine.c b/gcc/testsuite/gcc.target/aarch64/ampere1- > no_ldp_combine.c > > > > > deleted file mode 100644 > > > > > index bc871f4481d..00000000000 > > > > > --- a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c > > > > > +++ /dev/null > > > > > @@ -1,11 +0,0 @@ > > > > > -/* { dg-options "-O3 -mtune=ampere1" } */ > > > > > - > > > > > -long > > > > > -foo (long a[]) > > > > > -{ > > > > > - return a[0] + a[1]; > > > > > -} > > > > > - > > > > > -/* We should see two ldrs instead of one ldp. */ > > > > > -/* { dg-final { scan-assembler {\tldr\t} } } */ > > > > > -/* { dg-final { scan-assembler-not {\tldp\t} } } */ > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > > > > > new file mode 100644 > > > > > index 00000000000..8e43faab70d > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c > > > > > @@ -0,0 +1,66 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=3 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define LDP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[0]; \ > > > > > + a_1 = arr[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[0]; \ > > > > > + a_1 = a[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[100]; \ > > > > > + a_1 = arr[101]; \ > > > > > + a_2 = arr[102]; \ > > > > > + a_3 = arr[103]; \ > > > > > + a_4 = arr[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[100]; \ > > > > > + a_1 = a[101]; \ > > > > > + a_2 = a[102]; \ > > > > > + a_3 = a[103]; \ > > > > > + a_4 = a[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +LDP_TEST_ALIGNED(int32_t); > > > > > +LDP_TEST_ALIGNED(int64_t); > > > > > +LDP_TEST_ALIGNED(v4si); > > > > > +LDP_TEST_UNALIGNED(int32_t); > > > > > +LDP_TEST_UNALIGNED(int64_t); > > > > > +LDP_TEST_UNALIGNED(v4si); > > > > > +LDP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } > > > > > } */ > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c > b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > > > > > new file mode 100644 > > > > > index 00000000000..532ca607565 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c > > > > > @@ -0,0 +1,66 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=1 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define LDP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[0]; \ > > > > > + a_1 = arr[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[0]; \ > > > > > + a_1 = a[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[100]; \ > > > > > + a_1 = arr[101]; \ > > > > > + a_2 = arr[102]; \ > > > > > + a_3 = arr[103]; \ > > > > > + a_4 = arr[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[100]; \ > > > > > + a_1 = a[101]; \ > > > > > + a_2 = a[102]; \ > > > > > + a_3 = a[103]; \ > > > > > + a_4 = a[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +LDP_TEST_ALIGNED(int32_t); > > > > > +LDP_TEST_ALIGNED(int64_t); > > > > > +LDP_TEST_ALIGNED(v4si); > > > > > +LDP_TEST_UNALIGNED(int32_t); > > > > > +LDP_TEST_UNALIGNED(int64_t); > > > > > +LDP_TEST_UNALIGNED(v4si); > > > > > +LDP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } > > > > > } */ > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c > b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > > > > > new file mode 100644 > > > > > index 00000000000..b39941c18d7 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c > > > > > @@ -0,0 +1,66 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-ldp-policy=2 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define LDP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[0]; \ > > > > > + a_1 = arr[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[0]; \ > > > > > + a_1 = a[1]; \ > > > > > + return a_0 + a_1; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + a_0 = arr[100]; \ > > > > > + a_1 = arr[101]; \ > > > > > + a_2 = arr[102]; \ > > > > > + a_3 = arr[103]; \ > > > > > + a_4 = arr[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ > > > > > + TYPE a_0, a_1, a_2, a_3, a_4; \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a_0 = a[100]; \ > > > > > + a_1 = a[101]; \ > > > > > + a_2 = a[102]; \ > > > > > + a_3 = a[103]; \ > > > > > + a_4 = a[110]; \ > > > > > + return a_0 + a_1 + a_2 + a_3 + a_4; \ > > > > > +} > > > > > + > > > > > +LDP_TEST_ALIGNED(int32_t); > > > > > +LDP_TEST_ALIGNED(int64_t); > > > > > +LDP_TEST_ALIGNED(v4si); > > > > > +LDP_TEST_UNALIGNED(int32_t); > > > > > +LDP_TEST_UNALIGNED(int64_t); > > > > > +LDP_TEST_UNALIGNED(v4si); > > > > > +LDP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +LDP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } > > > > > } */ > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > > > > > new file mode 100644 > > > > > index 00000000000..01f294bb090 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c > > > > > @@ -0,0 +1,60 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=3 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define STP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[0] = x; \ > > > > > + arr[1] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[0] = x; \ > > > > > + a[1] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[100] = x; \ > > > > > + arr[101] = x; \ > > > > > + arr[102] = x; \ > > > > > + arr[103] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[100] = x; \ > > > > > + a[101] = x; \ > > > > > + a[102] = x; \ > > > > > + a[103] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +STP_TEST_ALIGNED(int32_t); > > > > > +STP_TEST_ALIGNED(int64_t); > > > > > +STP_TEST_ALIGNED(v4si); > > > > > +STP_TEST_UNALIGNED(int32_t); > > > > > +STP_TEST_UNALIGNED(int64_t); > > > > > +STP_TEST_UNALIGNED(v4si); > > > > > +STP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } > > > > > } */ > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c > b/gcc/testsuite/gcc.target/aarch64/stp_always.c > > > > > new file mode 100644 > > > > > index 00000000000..cedb461b5b2 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c > > > > > @@ -0,0 +1,60 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=1 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define STP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[0] = x; \ > > > > > + arr[1] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[0] = x; \ > > > > > + a[1] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[100] = x; \ > > > > > + arr[101] = x; \ > > > > > + arr[102] = x; \ > > > > > + arr[103] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[100] = x; \ > > > > > + a[101] = x; \ > > > > > + a[102] = x; \ > > > > > + a[103] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +STP_TEST_ALIGNED(int32_t); > > > > > +STP_TEST_ALIGNED(int64_t); > > > > > +STP_TEST_ALIGNED(v4si); > > > > > +STP_TEST_UNALIGNED(int32_t); > > > > > +STP_TEST_UNALIGNED(int64_t); > > > > > +STP_TEST_UNALIGNED(v4si); > > > > > +STP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } > > > > > } */ > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c > b/gcc/testsuite/gcc.target/aarch64/stp_never.c > > > > > new file mode 100644 > > > > > index 00000000000..ddde658f807 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c > > > > > @@ -0,0 +1,60 @@ > > > > > +/* { dg-options "-O2 --param=aarch64-stp-policy=2 -mcpu=generic" } > */ > > > > > + > > > > > +#include <stdlib.h> > > > > > +#include <stdint.h> > > > > > + > > > > > +typedef int v4si __attribute__ ((vector_size (16))); > > > > > + > > > > > +#define STP_TEST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[0] = x; \ > > > > > + arr[1] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[0] = x; \ > > > > > + a[1] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ > > > > > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + arr[100] = x; \ > > > > > + arr[101] = x; \ > > > > > + arr[102] = x; \ > > > > > + arr[103] = x; \ > > > > > + return arr; \ > > > > > +} > > > > > + > > > > > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ > > > > > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ > > > > > + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - > > > > > 1)); \ > > > > > + TYPE *a = arr+1; \ > > > > > + a[100] = x; \ > > > > > + a[101] = x; \ > > > > > + a[102] = x; \ > > > > > + a[103] = x; \ > > > > > + return a; \ > > > > > +} > > > > > + > > > > > +STP_TEST_ALIGNED(int32_t); > > > > > +STP_TEST_ALIGNED(int64_t); > > > > > +STP_TEST_ALIGNED(v4si); > > > > > +STP_TEST_UNALIGNED(int32_t); > > > > > +STP_TEST_UNALIGNED(int64_t); > > > > > +STP_TEST_UNALIGNED(v4si); > > > > > +STP_TEST_ADJUST_ALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_ALIGNED(int64_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int32_t); > > > > > +STP_TEST_ADJUST_UNALIGNED(int64_t); > > > > > + > > > > > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } > > > > > } */ > > > > > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } > > > > > } */ > > > > > + > > > > > -- > > > > > 2.40.1 > > > > > > > > > -- > Manos Anagnostakis | Compiler Engineer | > E: mailto:makeljana.shku...@vrull.eu > > VRULL GmbH | Beatrixgasse 32 1030 Vienna | > W: http://www.vrull.eu/ | https://www.linkedin.com/company/vrull/