Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>> On 15 Nov 2024, at 12:33, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
>> 
>> Hi Kyrill,
>> 
>>> This would make USE_NEW_VECTOR_COSTS effectively the default.
>>> Jennifer has been trying to do that as well and then to remove it (as it 
>>> would be always true) but there are some codegen regressions that still > 
>>> need to be addressed.
>> 
>> Yes, that's the goal - we should use good tuning settings by default, 
>> especially if
>> they work well on modern cores. I noticed a huge gap between 
>> -mcpu=neoverse-v2
>> and -march=armv9-a, so the idea is to make the tunings more similar. Note 
>> this
>> particular patch won't make a difference since both of these tunings already 
>> use the
>> new vector costs and throughput setting.
>> 
>>> See the threads “[RFC][PATCH] AArch64: Remove 
>>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS” from October and September.
>>> Do those regressions go away if you also specify 
>>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT at the same time?
>> 
>> I believe we always use both of those settings together. Removing the 
>> settings by
>> making them the default looks like a good idea indeed. We have too many tune
>> settings...
>
> In principle the only SVE-enabled SVE core that 
> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT wouldn’t apply for is A64FX but 
> that tuning was also not validated with 
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS so indeed in all current uses they 
> appear together.
> I wouldn’t mind assuming AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in the 
> generic tuning if others agree,

It looks like we already do that for generic_armv8.h and generic_armv9.h,
which are the ones that people would use in practice.  IMO we should
probably leave it out of "generic" itself, since like you say, A64FX is
a good example of why the flag is less likely to hold for long SVE vectors.

> but I don’t think we should remove the ! 
> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT paths just yet.

Yeah, I agree we should keep AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
tunable.

Thanks,
Richard

Reply via email to