Kyrylo Tkachov <ktkac...@nvidia.com> writes: > Hi all, > I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A > generic tuning. > As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used > to set the std::hardware_destructive_interference_size value which we want to > be not overly large when running concurrent applications on large core-count > systems. > > The generic value for Armv8-A systems and the port baseline is 256 bytes > because that's what the A64FX CPU has, as set de-facto in > aarch64_override_options_internal. > > But for Armv9-A CPUs as far as I know there isn't anything larger > than 64 bytes, so we should be able to use the smaller value here and reduce > the size of concurrent structs that use > std::hardware_destructive_interference_size to pad their fields. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > WDYT?
I suppose doing this for a form of generic tuning goes somewhat against: /* Set up parameters to be used in prefetching algorithm. Do not override the defaults unless we are tuning for a core we have researched values for. */ But I agree it doesn't make conceptual sense to constrain a known-to-be Armv9-A core based on values that are only needed for Armv8-A cores. So no objection from me FWIW. I think we would need to do something else if there are ever Armv9-A cores with different L1 cache line sizes though. E.g. if a new Armv9-A core has a 128-byte cache line, we would probably want to set the range to [64, 128] rather than the patch's [64, 64], and rather than the current [64, 256]. Thanks, Richard > Thanks, > Kyrill > > > * config/aarch64/tuning_models/generic_armv9_a.h > (generic_armv9a_prefetch_tune): Define. > (generic_armv9_a_tunings): Use the above. > > From 93aa4ec4d972dfff02ccd6751af160ed243aa750 Mon Sep 17 00:00:00 2001 > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Date: Fri, 20 Sep 2024 05:11:39 -0700 > Subject: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes > > I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A > generic tuning. > As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used > to set the std::hardware_destructive_interference_size value which we want to > be not overly large when running concurrent applications on large core-count > systems. > > The generic value for Armv8-A systems and the port baseline is 256 bytes > because that's what the A64FX CPU has, as set de-facto in > aarch64_override_options_internal. > > But for Armv9-A CPUs as far as I know there isn't anything larger > than 64 bytes, so we should be able to use the smaller value here and reduce > the size of concurrent structs that use > std::hardware_destructive_interference_size to pad their fields. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > * config/aarch64/tuning_models/generic_armv9_a.h > (generic_armv9a_prefetch_tune): Define. > (generic_armv9_a_tunings): Use the above. > --- > gcc/config/aarch64/tuning_models/generic_armv9_a.h | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > index 999985ed40f..76b3e4c9cf7 100644 > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > @@ -207,6 +207,18 @@ static const struct cpu_vector_cost > generic_armv9_a_vector_cost = > &generic_armv9_a_vec_issue_info /* issue_info */ > }; > > +/* Generic prefetch settings (which disable prefetch). */ > +static const cpu_prefetch_tune generic_armv9a_prefetch_tune = > +{ > + 0, /* num_slots */ > + -1, /* l1_cache_size */ > + 64, /* l1_cache_line_size */ > + -1, /* l2_cache_size */ > + true, /* prefetch_dynamic_strides */ > + -1, /* minimum_stride */ > + -1 /* default_opt_level */ > +}; > + > static const struct tune_params generic_armv9_a_tunings = > { > &cortexa76_extra_costs, > @@ -239,7 +251,7 @@ static const struct tune_params generic_armv9_a_tunings = > (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv9a_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > };