Hi Vladimir,

Thanks for the update, this patch looks much better now.
Some more comments inline.

> On 23 Jul 2024, at 14:47, vladimir.miloser...@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi All,
> 
> Changes since V1: add missing MD constraints, rename intrinsics,
> remove SME2 flag for LUT feature.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> This depends on "Extend aarch64_feature_flags to 128 bits" work which is soon
> to be submitted upstream as we ran out of 64-bit flags.
> 
> The patch needs to be committed for me as I don't have commit rights.
> 
> Ok for master when the pre-requisites get committed?
> 
> --
> 
> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
> 
> LUTI instructions are used for efficient table lookups with 2-bit or 4-bit
> indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of
> the table vector using packed 2-bit indices, while LUTI4 can read from the low
> 128 or 256 bits of the table vector or from two table vectors using packed
> 4-bit indices. These instructions fill the destination vector by copying
> elements indexed by segments of the source vector, selected by the vector
> segment index.
> 
> The changes include the addition of a new AArch64 option extension "lut",
> __ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction
> shapes, and implementations of the svluti2 and svluti4 builtins.
> 
> BR,
> - Vladimir
> 
> gcc/ChangeLog:
> 
>        * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>        Add support for __ARM_FEATURE_LUT preprocessor macro.
>        * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
>        Add "lut" option extension.
>        * config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base):
>        Define new LUTI ACLE shapes.
>        (SHAPE): Define shapes for luti2 and luti4.
>        * config/aarch64/aarch64-sve-builtins-shapes.h: Add declarations
>        for luti2 and luti4.
>        * config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl):
>        Implement support for LUTI instructions.
>        (FUNCTION): Register svluti2 and svluti4 functions.
>        * config/aarch64/aarch64-sve-builtins-sve2.def (svluti2):
>        Define svluti2 function.
>        (svluti4): Define svluti4 function.
>        * config/aarch64/aarch64-sve-builtins-sve2.h: Add declarations
>        for svluti2 and svluti4.
>        * config/aarch64/aarch64-sve2.md (@aarch64_sve_luti<LUTI_BITS><mode>):
>        Define machine description patterns for LUTI.
>        * config/aarch64/aarch64.h (AARCH64_ISA_LUT): Define macro for LUTI.
>        (TARGET_LUT): Likewise.
>        * config/aarch64/iterators.md: Define mode iterators
>        for LUTI MD patterns.
> 
> gcc/testsuite/ChangeLog:
> 
>        * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add macro for
>        SVE ACLE to enable LUTI tests.
>        * lib/target-supports.exp: Update to include check for the LUT feature.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
>        * gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.
> 
> <v2-0001-AArch64-Add-LUTI-ACLE-for-SVE2.patch>diff --git 
> a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 59b2246cf8e..c1fc1955c92 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -272,6 +272,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+ aarch64_def_or_undef (TARGET_LUT, "__ARM_FEATURE_LUT", pfile);

From reading the ACLE pull request:
https://github.com/ARM-software/acle/pull/324/files
It looks like __ARM_FEATURE_LUT should guard the Advanced SIMD intrinsics for 
LUTI at least.
Therefore we should add this macro definition only once those are implemented 
as well. So I’d remove this hunk.

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c33f5da02f4..8542f01ec85 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -285,6 +285,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
#define AARCH64_ISA_D128 (aarch64_isa_flags & AARCH64_FL_D128)
#define AARCH64_ISA_THE (aarch64_isa_flags & AARCH64_FL_THE)
#define AARCH64_ISA_GCS (aarch64_isa_flags & AARCH64_FL_GCS)
+#define AARCH64_ISA_LUT (aarch64_isa_flags & AARCH64_FL_LUT)


Since this patch depends on Andrew’s feature flags rework shouldn’t this hunk 
look like:
AARCH64_HAVE_ISA (LUT)
Instead?

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index daa0c75d2bc..a4ce43795c9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4621,6 +4621,18 @@ proc check_effective_target_aarch64_sve2 { } {
}]
}

+# Return 1 if this is an AArch64 target supporting LUT (Lookup table)
+proc check_effective_target_aarch64_lut { } {
+ if { ![istarget aarch64*-*-*] || ![check_effective_target_aarch64_sve2] } {
+ return 0
+ }
+ return [check_no_compiler_messages aarch64_lut assembly {
+ #if !defined (__ARM_FEATURE_LUT)
+ #error FOO
+ #endif
+ }]
+}

I don’t see this effective target check used anywhere, do we need to add it?
I guess I don’t mind it for future use, but just checking that it was 
deliberate.

Thanks,
Kyrill

Reply via email to