On 6/5/24 20:18, Richard Henderson wrote:
On 6/5/24 19:30, maobibo wrote:
On 2024/6/6 上午7:51, Richard Henderson wrote:
On 6/5/24 02:32, Bibo Mao wrote:
Different gcc versions have different features, macro CONFIG_LSX_OPT
and CONFIG_LASX_OPT is added here to detect whether gcc supports
built-in lsx/lasx macro.
Function buffer_zero_lsx() is added for 128bit simd fpu optimization,
and function buffer_zero_lasx() is for 256bit simd fpu optimization.
Loongarch gcc built-in lsx/lasx macro can be used only when compiler
option -mlsx/-mlasx is added, and there is no separate compiler option
for function only. So it is only in effect when qemu is compiled with
parameter --extra-cflags="-mlasx"
Signed-off-by: Bibo Mao <maob...@loongson.cn>
---
meson.build | 11 +++++
util/bufferiszero.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 114 insertions(+)
diff --git a/meson.build b/meson.build
index 6386607144..29bc362d7a 100644
--- a/meson.build
+++ b/meson.build
@@ -2855,6 +2855,17 @@ config_host_data.set('CONFIG_ARM_AES_BUILTIN',
cc.compiles('''
void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); }
'''))
+# For Loongarch64, detect if LSX/LASX are available.
+ config_host_data.set('CONFIG_LSX_OPT', cc.compiles('''
+ #include "lsxintrin.h"
+ int foo(__m128i v) { return __lsx_bz_v(v); }
+ '''))
+
+config_host_data.set('CONFIG_LASX_OPT', cc.compiles('''
+ #include "lasxintrin.h"
+ int foo(__m256i v) { return __lasx_xbz_v(v); }
+ '''))
Both of these are introduced by gcc 14 and llvm 18, so I'm not certain of the utility
of separate tests. We might simplify this with
config_host_data.set('CONFIG_LSX_LASX_INTRIN_H',
cc.has_header('lsxintrin.h') && cc.has_header('lasxintrin.h'))
As you say, these headers require vector instructions to be enabled at compile-time
rather than detecting them at runtime. This is a point where the compilers could be
improved to support __attribute__((target("xyz"))) and the builtins with that. The
i386 port does this, for instance.
In the meantime, it means that you don't need a runtime test. Similar to aarch64 and
the use of __ARM_NEON as a compile-time test for simd support. Perhaps
#elif defined(CONFIG_LSX_LASX_INTRIN_H) && \
(defined(__loongarch_sx) || defined(__loongarch_asx))
# ifdef __loongarch_sx
...
# endif
# ifdef __loongarch_asx
...
# endif
Sure, will do in this way.
And also there is runtime check coming from hwcap, such this:
unsigned info = cpuinfo_init();
if (info & CPUINFO_LASX)
static biz_accel_fn const accel_table[] = {
buffer_is_zero_int_ge256,
#ifdef __loongarch_sx
buffer_is_zero_lsx,
#endif
#ifdef __loongarch_asx
buffer_is_zero_lasx,
#endif
};
static unsigned best_accel(void)
{
#ifdef __loongarch_asx
/* lasx may be index 1 or 2, but always last */
return ARRAY_SIZE(accel_table) - 1;
#else
/* lsx is always index 1 */
return 1;
#endif
}
It occurs to me that by accumulating host specific sections to this file, we should split
it like the atomics. Put each portion in host/include/*/host/bufferiszero.h.inc.
I'll send a patch set handling the existing two hosts.
r~