Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> Hi Richard,
>
>> Sorry to be awkward, but I don't think we should put
>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base.
>> CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full
>> use of a certain group of instructions.  FULLY_PIPELINED_FMA similarly
>> means that FMA chains behave as one would expect.
>
> So does that imply you're happy with
> [2/3] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673224.html ?
>
>> But MATCHED_VECTOR_THROUGHPUT feels to me more like a property of
>> a particular uarch.  I don't see a reason in principle why future
>> cores must provide the same Advanced SIMD bandwidth as SVE bandwidth.
>
> These are really all microarchitecture tuning related, it's just that some 
> are so
> standard that they can be the default for all modern cores. This removes the
> repeated clutter in the tuning models, and it reduces the chances of new CPUs
> accidentally using incorrect settings.

Right.  I suppose what I meant was: AARCH64_FUSE_BASE and (incidentally)
AARCH64_EXTRA_TUNE_BASE are IMO for things where 0->1 transitions can be
seen as forward progress, and so it's relatively unlikely that a family
of uarchs would include a 1->0 transition.  But I don't think
MATCHED_VECTOR_THROUGHPUT is like that.  If later uarchs want more
vector bandwidth, it would be perfectly reasonable to provide it for
SVE only.

> Note many older cores don't use the base setting, and one could remove 
> particular
> tunings or add a new tune in the future for exceptions like A64FX.
>
>> The AVOID_PRED_RMW is a good catch though, thanks.  +1 to Kyrill's ok
>> for that part.
>
> I've updated the patch to just fix the neoverse512tvb tuning - also I spotted 
> this
> wasn't yet using AARCH64_EXTRA_TUNE_BASE either... So now at least the tuning
> flags are finally more consistent!
>
> Cheers,
> Wilco
>
>
> v2: Update to just improve neoverse512tvb tuning
>
> AArch64: Update neoverse512tvb tuning
>
> Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing
> AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
>
> gcc:
>       * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update.

Thanks, LGTM.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h 
> b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> index 
> 50eb058e23d1a824d925f6258654f9c3c7abbdff..964b4ac284a895cbea4bf889894dd662374f0d2a
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> @@ -155,8 +155,10 @@ static const struct tune_params neoverse512tvb_tunings =
>    2, /* min_div_recip_mul_df.  */
>    0, /* max_case_values.  */
>    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_BASE
> +   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> +   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),     /* tune_flags.  */
>    &generic_armv9a_prefetch_tune,
>    AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>    AARCH64_LDP_STP_POLICY_ALWAYS         /* stp_policy_model.  */

Reply via email to