On 6/5/24 20:18, Richard Henderson wrote:
On 6/5/24 19:30, maobibo wrote:


On 2024/6/6 上午7:51, Richard Henderson wrote:
On 6/5/24 02:32, Bibo Mao wrote:
Different gcc versions have different features, macro CONFIG_LSX_OPT
and CONFIG_LASX_OPT is added here to detect whether gcc supports
built-in lsx/lasx macro.

Function buffer_zero_lsx() is added for 128bit simd fpu optimization,
and function buffer_zero_lasx() is for 256bit simd fpu optimization.

Loongarch gcc built-in lsx/lasx macro can be used only when compiler
option -mlsx/-mlasx is added, and there is no separate compiler option
for function only. So it is only in effect when qemu is compiled with
parameter --extra-cflags="-mlasx"

Signed-off-by: Bibo Mao <maob...@loongson.cn>
---
  meson.build         |  11 +++++
  util/bufferiszero.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
  2 files changed, 114 insertions(+)

diff --git a/meson.build b/meson.build
index 6386607144..29bc362d7a 100644
--- a/meson.build
+++ b/meson.build
@@ -2855,6 +2855,17 @@ config_host_data.set('CONFIG_ARM_AES_BUILTIN', 
cc.compiles('''
      void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); }
    '''))
+# For Loongarch64, detect if LSX/LASX are available.
+ config_host_data.set('CONFIG_LSX_OPT', cc.compiles('''
+    #include "lsxintrin.h"
+    int foo(__m128i v) { return __lsx_bz_v(v); }
+  '''))
+
+config_host_data.set('CONFIG_LASX_OPT', cc.compiles('''
+    #include "lasxintrin.h"
+    int foo(__m256i v) { return __lasx_xbz_v(v); }
+  '''))

Both of these are introduced by gcc 14 and llvm 18, so I'm not certain of the utility of separate tests.  We might simplify this with

   config_host_data.set('CONFIG_LSX_LASX_INTRIN_H',
     cc.has_header('lsxintrin.h') && cc.has_header('lasxintrin.h'))


As you say, these headers require vector instructions to be enabled at compile-time rather than detecting them at runtime.  This is a point where the compilers could be improved to support __attribute__((target("xyz"))) and the builtins with that.  The i386 port does this, for instance.

In the meantime, it means that you don't need a runtime test.  Similar to aarch64 and the use of __ARM_NEON as a compile-time test for simd support.  Perhaps

#elif defined(CONFIG_LSX_LASX_INTRIN_H) && \
       (defined(__loongarch_sx) || defined(__loongarch_asx))
# ifdef __loongarch_sx
   ...
# endif
# ifdef __loongarch_asx
   ...
# endif
Sure, will do in this way.
And also there is runtime check coming from hwcap, such this:

unsigned info = cpuinfo_init();
   if (info & CPUINFO_LASX)

static biz_accel_fn const accel_table[] = {
     buffer_is_zero_int_ge256,
#ifdef __loongarch_sx
     buffer_is_zero_lsx,
#endif
#ifdef __loongarch_asx
     buffer_is_zero_lasx,
#endif
};

static unsigned best_accel(void)
{
#ifdef __loongarch_asx
     /* lasx may be index 1 or 2, but always last */
     return ARRAY_SIZE(accel_table) - 1;
#else
     /* lsx is always index 1 */
     return 1;
#endif
}

It occurs to me that by accumulating host specific sections to this file, we should split it like the atomics. Put each portion in host/include/*/host/bufferiszero.h.inc.

I'll send a patch set handling the existing two hosts.


r~


Reply via email to