Hi Vladimir,

> On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
> 
> LUTI instructions are used for efficient table lookups with 2-bit
> or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
> the low 128 bits of the table vector using packed 2-bit indices,
> while LUTI4 can read from the low 128 or 256 bits of the table
> vector or from two table vectors using packed 4-bit indices.
> These instructions fill the destination vector by copying elements
> indexed by segments of the source vector, selected by the vector
> segment index.
> 
> The changes include the addition of a new AArch64 option
> extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
> for the new LUTI instruction shapes, and implementations of the
> svluti2 and svluti4 builtins.
> 
> New tests are added as well
> ---
> gcc/config/aarch64/aarch64-c.cc               |  1 +
> .../aarch64/aarch64-option-extensions.def     |  2 +
> .../aarch64/aarch64-sve-builtins-shapes.cc    | 41 +++++++++++++++++
> .../aarch64/aarch64-sve-builtins-shapes.h     |  2 +
> .../aarch64/aarch64-sve-builtins-sve2.cc      | 17 +++++++
> .../aarch64/aarch64-sve-builtins-sve2.def     |  4 ++
> .../aarch64/aarch64-sve-builtins-sve2.h       |  2 +
> gcc/config/aarch64/aarch64-sve2.md            | 45 +++++++++++++++++++
> gcc/config/aarch64/aarch64.h                  |  5 +++
> gcc/config/aarch64/iterators.md               | 10 +++++
> .../aarch64/sve/acle/asm/test_sve_acle.h      | 16 ++++++-
> .../aarch64/sve2/acle/asm/luti2_bf16.c        | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti2_f16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti2_s16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti2_s8.c          | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti2_u16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti2_u8.c          | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti4_bf16.c        | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti4_bf16_x2.c     | 15 +++++++
> .../aarch64/sve2/acle/asm/luti4_f16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti4_f16_x2.c      | 15 +++++++
> .../aarch64/sve2/acle/asm/luti4_s16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti4_s16_x2.c      | 15 +++++++
> .../aarch64/sve2/acle/asm/luti4_s8.c          | 25 +++++++++++
> .../aarch64/sve2/acle/asm/luti4_u16.c         | 35 +++++++++++++++
> .../aarch64/sve2/acle/asm/luti4_u16_x2.c      | 15 +++++++
> .../aarch64/sve2/acle/asm/luti4_u8.c          | 25 +++++++++++
> gcc/testsuite/lib/target-supports.exp         | 12 +++++
> 28 files changed, 616 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c
> 

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..840f52e08ed 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")

AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")

+AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut")
+

I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to use 
any SME state. I don’t think +lut should be enabling +sme2 for the user

+;; -------------------------------------------------------------------------
+;; ---- Table lookup
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - LUTI2
+;; - LUTI4
+;; -------------------------------------------------------------------------
+
+(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
+ [(set (match_operand:SVE_FULL_BS 0 "register_operand" "=w")
+ (unspec:SVE_FULL_BS
+ [(match_operand:SVE_FULL_BS 1 "register_operand" "w")
+ (match_operand:VNx16QI 2 "register_operand" "w")
+ (match_operand:DI 3 "const_int_operand")
+ (const_int LUTI_BITS)]
+ UNSPEC_SVE_LUTI))]
+ "TARGET_SVE2"
+ "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]"
+)


+
+(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
+ [(set (match_operand:<VSINGLE> 0 "register_operand")
+ (unspec:<VSINGLE>
+ [(match_operand:SVE_FULL_H 1 "aligned_register_operand" "w")
+ (match_operand:VNx16QI 2 "register_operand")
+ (match_operand:DI 3 "const_int_operand")
+ (const_int LUTI_BITS)]
+ UNSPEC_SVE_LUTI))]
+ "TARGET_SVE2"
+ "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]"
+)

Missing constraints on operands 0 and 3?

+
+(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
+ [(set (match_operand:<VSINGLE> 0 "register_operand")
+ (unspec:<VSINGLE>
+ [(match_operand:SVE_FULL_Hx2 1 "aligned_register_operand" "Uw2")
+ (match_operand:VNx16QI 2 "register_operand")
+ (match_operand:DI 3 "const_int_operand")
+ (const_int LUTI_BITS)]
+ UNSPEC_SVE_LUTI))]
+ "TARGET_SVE2"
+ "luti<LUTI_BITS>\t%0.<Vetype>, %1, %2[%3]"
+)

Likewise.

Thanks,
Kyrill

Reply via email to