Hi Richard, > Sorry to be awkward, but I don't think we should put > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base. > CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full > use of a certain group of instructions. FULLY_PIPELINED_FMA similarly > means that FMA chains behave as one would expect.
So does that imply you're happy with [2/3] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673224.html ? > But MATCHED_VECTOR_THROUGHPUT feels to me more like a property of > a particular uarch. I don't see a reason in principle why future > cores must provide the same Advanced SIMD bandwidth as SVE bandwidth. These are really all microarchitecture tuning related, it's just that some are so standard that they can be the default for all modern cores. This removes the repeated clutter in the tuning models, and it reduces the chances of new CPUs accidentally using incorrect settings. Note many older cores don't use the base setting, and one could remove particular tunings or add a new tune in the future for exceptions like A64FX. > The AVOID_PRED_RMW is a good catch though, thanks. +1 to Kyrill's ok > for that part. I've updated the patch to just fix the neoverse512tvb tuning - also I spotted this wasn't yet using AARCH64_EXTRA_TUNE_BASE either... So now at least the tuning flags are finally more consistent! Cheers, Wilco v2: Update to just improve neoverse512tvb tuning AArch64: Update neoverse512tvb tuning Fix the neoverse512tvb tuning to be like Neoverse V1/V2 and add the missing AARCH64_EXTRA_TUNE_BASE and AARCH64_EXTRA_TUNE_AVOID_PRED_RMW. gcc: * config/aarch64/tuning_models/neoverse512tvb.h (tune_flags): Update. --- diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h index 50eb058e23d1a824d925f6258654f9c3c7abbdff..964b4ac284a895cbea4bf889894dd662374f0d2a 100644 --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -155,8 +155,10 @@ static const struct tune_params neoverse512tvb_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS - | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_BASE + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT + | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */