Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Hi Richard, > >> Sorry to be awkward, but I don't think we should put >> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base. >> CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full >> use of a certain group of instructions. FULLY_PIPELINED_FMA similarly >> means that FMA chains behave as one would expect. > > So does that imply you're happy with > [2/3] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673224.html ? > >> But MATCHED_VECTOR_THROUGHPUT feels to me more like a property of >> a particular uarch. I don't see a reason in principle why future >> cores must provide the same Advanced SIMD bandwidth as SVE bandwidth. > > These are really all microarchitecture tuning related, it's just that some > are so > standard that they can be the default for all modern cores. This removes the > repeated clutter in the tuning models, and it reduces the chances of new CPUs > accidentally using incorrect settings.
Right. I suppose what I meant was: AARCH64_FUSE_BASE and (incidentally) AARCH64_EXTRA_TUNE_BASE are IMO for things where 0->1 transitions can be seen as forward progress, and so it's relatively unlikely that a family of uarchs would include a 1->0 transition. But I don't think MATCHED_VECTOR_THROUGHPUT is like that. If later uarchs want more vector bandwidth, it would be perfectly reasonable to provide it for SVE only. > Note many older cores don't use the base setting, and one could remove > particular > tunings or add a new tune in the future for exceptions like A64FX. > >> The AVOID_PRED_RMW is a good catch though, thanks. +1 to Kyrill's ok >> for that part. > > I've updated the patch to just fix the neoverse512tvb tuning - also I spotted > this > wasn't yet using AARCH64_EXTRA_TUNE_BASE either... So now at least the tuning > flags are finally more consistent! > > Cheers, > Wilco > > > v2: Update to just improve neoverse512tvb tuning > > AArch64: Update neoverse512tvb tuning > > Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing > AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. > > gcc: > * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update. Thanks, LGTM. Richard > > --- > > diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > index > 50eb058e23d1a824d925f6258654f9c3c7abbdff..964b4ac284a895cbea4bf889894dd662374f0d2a > 100644 > --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > @@ -155,8 +155,10 @@ static const struct tune_params neoverse512tvb_tunings = > 2, /* min_div_recip_mul_df. */ > 0, /* max_case_values. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_BASE > + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > + | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ > &generic_armv9a_prefetch_tune, > AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ > AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */