Kyrylo Tkachov <ktkac...@nvidia.com> writes: >> On 15 Nov 2024, at 12:33, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: >> >> Hi Kyrill, >> >>> This would make USE_NEW_VECTOR_COSTS effectively the default. >>> Jennifer has been trying to do that as well and then to remove it (as it >>> would be always true) but there are some codegen regressions that still > >>> need to be addressed. >> >> Yes, that's the goal - we should use good tuning settings by default, >> especially if >> they work well on modern cores. I noticed a huge gap between >> -mcpu=neoverse-v2 >> and -march=armv9-a, so the idea is to make the tunings more similar. Note >> this >> particular patch won't make a difference since both of these tunings already >> use the >> new vector costs and throughput setting. >> >>> See the threads “[RFC][PATCH] AArch64: Remove >>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS” from October and September. >>> Do those regressions go away if you also specify >>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT at the same time? >> >> I believe we always use both of those settings together. Removing the >> settings by >> making them the default looks like a good idea indeed. We have too many tune >> settings... > > In principle the only SVE-enabled SVE core that > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT wouldn’t apply for is A64FX but > that tuning was also not validated with > AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS so indeed in all current uses they > appear together. > I wouldn’t mind assuming AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in the > generic tuning if others agree,
It looks like we already do that for generic_armv8.h and generic_armv9.h, which are the ones that people would use in practice. IMO we should probably leave it out of "generic" itself, since like you say, A64FX is a good example of why the flag is less likely to hold for long SVE vectors. > but I don’t think we should remove the ! > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT paths just yet. Yeah, I agree we should keep AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT tunable. Thanks, Richard