Hi Vladimir, > On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote: > > External email: Use caution opening links or attachments > > > This patch introduces support for LUTI2/LUTI4 ACLE for SVE2. > > LUTI instructions are used for efficient table lookups with 2-bit > or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from > the low 128 bits of the table vector using packed 2-bit indices, > while LUTI4 can read from the low 128 or 256 bits of the table > vector or from two table vectors using packed 4-bit indices. > These instructions fill the destination vector by copying elements > indexed by segments of the source vector, selected by the vector > segment index. > > The changes include the addition of a new AArch64 option > extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions > for the new LUTI instruction shapes, and implementations of the > svluti2 and svluti4 builtins. > > New tests are added as well > --- > gcc/config/aarch64/aarch64-c.cc | 1 + > .../aarch64/aarch64-option-extensions.def | 2 + > .../aarch64/aarch64-sve-builtins-shapes.cc | 41 +++++++++++++++++ > .../aarch64/aarch64-sve-builtins-shapes.h | 2 + > .../aarch64/aarch64-sve-builtins-sve2.cc | 17 +++++++ > .../aarch64/aarch64-sve-builtins-sve2.def | 4 ++ > .../aarch64/aarch64-sve-builtins-sve2.h | 2 + > gcc/config/aarch64/aarch64-sve2.md | 45 +++++++++++++++++++ > gcc/config/aarch64/aarch64.h | 5 +++ > gcc/config/aarch64/iterators.md | 10 +++++ > .../aarch64/sve/acle/asm/test_sve_acle.h | 16 ++++++- > .../aarch64/sve2/acle/asm/luti2_bf16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti2_f16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti2_s16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti2_s8.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti2_u16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti2_u8.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti4_bf16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti4_bf16_x2.c | 15 +++++++ > .../aarch64/sve2/acle/asm/luti4_f16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti4_f16_x2.c | 15 +++++++ > .../aarch64/sve2/acle/asm/luti4_s16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti4_s16_x2.c | 15 +++++++ > .../aarch64/sve2/acle/asm/luti4_s8.c | 25 +++++++++++ > .../aarch64/sve2/acle/asm/luti4_u16.c | 35 +++++++++++++++ > .../aarch64/sve2/acle/asm/luti4_u16_x2.c | 15 +++++++ > .../aarch64/sve2/acle/asm/luti4_u8.c | 25 +++++++++++ > gcc/testsuite/lib/target-supports.exp | 12 +++++ > 28 files changed, 616 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c >
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 42ec0eec31e..840f52e08ed 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the") AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") +AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut") + I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to use any SME state. I don’t think +lut should be enabling +sme2 for the user +;; ------------------------------------------------------------------------- +;; ---- Table lookup +;; ------------------------------------------------------------------------- +;; Includes: +;; - LUTI2 +;; - LUTI4 +;; ------------------------------------------------------------------------- + +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" + [(set (match_operand:SVE_FULL_BS 0 "register_operand" "=w") + (unspec:SVE_FULL_BS + [(match_operand:SVE_FULL_BS 1 "register_operand" "w") + (match_operand:VNx16QI 2 "register_operand" "w") + (match_operand:DI 3 "const_int_operand") + (const_int LUTI_BITS)] + UNSPEC_SVE_LUTI))] + "TARGET_SVE2" + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]" +) + +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" + [(set (match_operand:<VSINGLE> 0 "register_operand") + (unspec:<VSINGLE> + [(match_operand:SVE_FULL_H 1 "aligned_register_operand" "w") + (match_operand:VNx16QI 2 "register_operand") + (match_operand:DI 3 "const_int_operand") + (const_int LUTI_BITS)] + UNSPEC_SVE_LUTI))] + "TARGET_SVE2" + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]" +) Missing constraints on operands 0 and 3? + +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" + [(set (match_operand:<VSINGLE> 0 "register_operand") + (unspec:<VSINGLE> + [(match_operand:SVE_FULL_Hx2 1 "aligned_register_operand" "Uw2") + (match_operand:VNx16QI 2 "register_operand") + (match_operand:DI 3 "const_int_operand") + (const_int LUTI_BITS)] + UNSPEC_SVE_LUTI))] + "TARGET_SVE2" + "luti<LUTI_BITS>\t%0.<Vetype>, %1, %2[%3]" +) Likewise. Thanks, Kyrill