Hello Kyrill, Sorry for the slow response. The performance on a64fx is not impacted with this patch.
Regards, Qian > -----Original Message----- > From: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Sent: Wednesday, March 10, 2021 10:56 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <richard.sandif...@arm.com>; Qian, Jianhua/钱 建华 > <qia...@fujitsu.com> > Subject: [PATCH] aarch64: Improve generic SVE tuning defaults > > Hi all, > > This patch adds the recently-added tweak to split some SVE VL-based scalar > operations [1] to the generic tuning used for SVE, as enabled by adding +sve > to > the -march flag, for example -march=armv8.2-a+sve. > > The recommendation for best performance on a particular CPU remains > unchanged: > use the -mcpu option for that CPU, where possible. -mcpu=native makes this > straightforward for native compilation. > > The tweak to split out SVE VL-based scalar operations is a consistent win for > the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of > SPEC2017 on A64FX with this tweak on didn't show any non-noise differences. > It is also expected to be neutral on SVE2 implementations. > > Therefore, the patch enables the tweak for generic +sve tuning e.g. > -march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it, > therefore the tweak is disabled for generic tuning when +sve2 is in -march > e.g. > -march=armv8.2-a+sve2. > > The implementation of this approach requires a bit of custom logic in > aarch64_override_options_internal to handle these kinds of > architecture-dependent decisions, but we do believe the user-facing principle > here is important to implement. > > Qian, as you've contributed the A64FX support to GCC, I would be grateful for > your feedback on this approach and in particular on the performance evaluation > of this change. > > In general, for the generic target we're using a decision framework that looks > like: > > * If all cores that are known to benefit from an optimization are of > architecture X, > and all other cores that implement X or above are not impacted, or have a very > slight impact, we will consider it for generic tuning for architecture X. > * We will not enable that optimisation for generic tuning for architecture > X+1 if > no known cores of architecture X+1 or above will benefit. > > This framework allows us to improve generic tuning for CPUs of generation X > while avoiding accumulating tweaks for future CPUs of generation X+1, X+2... > that do not need them, and thus avoid even the slight negative effects of > these > optimisations if the user is willing to tell us the desired architecture > accurately. > > X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a > etc) or optional architecture extensions (like SVE, SVE2). > > We think that this patch fits that framework, so would like to propose it for > the > trunk default tunings for SVE. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Thanks, > Kyrill > > [1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7 > > gcc/ChangeLog: > > * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): > Define. > (aarch64_override_options_internal): Use it. > (generic_tunings): Add > AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to > tune_flags. > > gcc/testsuite/ChangeLog: > > * g++.target/aarch64/sve/aarch64-sve.exp: Add > -moverride=tune=none to > sve_flags. > * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. > * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. > * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise. > * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. > * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. >