Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi all,
> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
> generic tuning.
> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
> to set the std::hardware_destructive_interference_size value which we want to
> be not overly large when running concurrent applications on large core-count
> systems.
>
> The generic value for Armv8-A systems and the port baseline is 256 bytes
> because that's what the A64FX CPU has, as set de-facto in
> aarch64_override_options_internal.
>
> But for Armv9-A CPUs as far as I know there isn't anything larger
> than 64 bytes, so we should be able to use the smaller value here and reduce
> the size of concurrent structs that use
> std::hardware_destructive_interference_size to pad their fields.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> WDYT?

I suppose doing this for a form of generic tuning goes somewhat against:

  /* Set up parameters to be used in prefetching algorithm.  Do not
     override the defaults unless we are tuning for a core we have
     researched values for.  */

But I agree it doesn't make conceptual sense to constrain a known-to-be
Armv9-A core based on values that are only needed for Armv8-A cores.
So no objection from me FWIW.

I think we would need to do something else if there are ever Armv9-A
cores with different L1 cache line sizes though.  E.g. if a new Armv9-A
core has a 128-byte cache line, we would probably want to set the range
to [64, 128] rather than the patch's [64, 64], and rather than the
current [64, 256].

Thanks,
Richard


> Thanks,
> Kyrill
>
>
>         * config/aarch64/tuning_models/generic_armv9_a.h
>         (generic_armv9a_prefetch_tune): Define.
>         (generic_armv9_a_tunings): Use the above.
>
> From 93aa4ec4d972dfff02ccd6751af160ed243aa750 Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Date: Fri, 20 Sep 2024 05:11:39 -0700
> Subject: [PATCH] aarch64: Set Armv9-A generic L1 cache line size to 64 bytes
>
> I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
> generic tuning.
> As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
> to set the std::hardware_destructive_interference_size value which we want to
> be not overly large when running concurrent applications on large core-count
> systems.
>
> The generic value for Armv8-A systems and the port baseline is 256 bytes
> because that's what the A64FX CPU has, as set de-facto in
> aarch64_override_options_internal.
>
> But for Armv9-A CPUs as far as I know there isn't anything larger
> than 64 bytes, so we should be able to use the smaller value here and reduce
> the size of concurrent structs that use
> std::hardware_destructive_interference_size to pad their fields.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
>       * config/aarch64/tuning_models/generic_armv9_a.h
>       (generic_armv9a_prefetch_tune): Define.
>       (generic_armv9_a_tunings): Use the above.
> ---
>  gcc/config/aarch64/tuning_models/generic_armv9_a.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> index 999985ed40f..76b3e4c9cf7 100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> @@ -207,6 +207,18 @@ static const struct cpu_vector_cost 
> generic_armv9_a_vector_cost =
>    &generic_armv9_a_vec_issue_info /* issue_info  */
>  };
>  
> +/* Generic prefetch settings (which disable prefetch).  */
> +static const cpu_prefetch_tune generic_armv9a_prefetch_tune =
> +{
> +  0,                 /* num_slots  */
> +  -1,                        /* l1_cache_size  */
> +  64,                        /* l1_cache_line_size  */
> +  -1,                        /* l2_cache_size  */
> +  true,                      /* prefetch_dynamic_strides */
> +  -1,                        /* minimum_stride */
> +  -1                 /* default_opt_level  */
> +};
> +
>  static const struct tune_params generic_armv9_a_tunings =
>  {
>    &cortexa76_extra_costs,
> @@ -239,7 +251,7 @@ static const struct tune_params generic_armv9_a_tunings =
>    (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
>     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> -  &generic_prefetch_tune,
> +  &generic_armv9a_prefetch_tune,
>    AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>    AARCH64_LDP_STP_POLICY_ALWAYS         /* stp_policy_model.  */
>  };

Reply via email to