On 6/5/24 02:32, Bibo Mao wrote:
Different gcc versions have different features, macro CONFIG_LSX_OPT
and CONFIG_LASX_OPT is added here to detect whether gcc supports
built-in lsx/lasx macro.
Function buffer_zero_lsx() is added for 128bit simd fpu optimization,
and function buffer_zero_lasx() is for 256bit simd fpu optimization.
Loongarch gcc built-in lsx/lasx macro can be used only when compiler
option -mlsx/-mlasx is added, and there is no separate compiler option
for function only. So it is only in effect when qemu is compiled with
parameter --extra-cflags="-mlasx"
Signed-off-by: Bibo Mao <maob...@loongson.cn>
---
meson.build | 11 +++++
util/bufferiszero.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 114 insertions(+)
diff --git a/meson.build b/meson.build
index 6386607144..29bc362d7a 100644
--- a/meson.build
+++ b/meson.build
@@ -2855,6 +2855,17 @@ config_host_data.set('CONFIG_ARM_AES_BUILTIN',
cc.compiles('''
void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); }
'''))
+# For Loongarch64, detect if LSX/LASX are available.
+ config_host_data.set('CONFIG_LSX_OPT', cc.compiles('''
+ #include "lsxintrin.h"
+ int foo(__m128i v) { return __lsx_bz_v(v); }
+ '''))
+
+config_host_data.set('CONFIG_LASX_OPT', cc.compiles('''
+ #include "lasxintrin.h"
+ int foo(__m256i v) { return __lasx_xbz_v(v); }
+ '''))
Both of these are introduced by gcc 14 and llvm 18, so I'm not certain of the utility of
separate tests. We might simplify this with
config_host_data.set('CONFIG_LSX_LASX_INTRIN_H',
cc.has_header('lsxintrin.h') && cc.has_header('lasxintrin.h'))
As you say, these headers require vector instructions to be enabled at compile-time rather
than detecting them at runtime. This is a point where the compilers could be improved to
support __attribute__((target("xyz"))) and the builtins with that. The i386 port does
this, for instance.
In the meantime, it means that you don't need a runtime test. Similar to aarch64 and the
use of __ARM_NEON as a compile-time test for simd support. Perhaps
#elif defined(CONFIG_LSX_LASX_INTRIN_H) && \
(defined(__loongarch_sx) || defined(__loongarch_asx))
# ifdef __loongarch_sx
...
# endif
# ifdef __loongarch_asx
...
# endif
The actual code is perfectly fine, of course, since it follows the pattern from the
others. How much improvement do you see from bufferiszero-bench?
r~