> On 11 Jul 2024, at 09:18, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Vladimir,
> 
>> On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
>> 
>> LUTI instructions are used for efficient table lookups with 2-bit
>> or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
>> the low 128 bits of the table vector using packed 2-bit indices,
>> while LUTI4 can read from the low 128 or 256 bits of the table
>> vector or from two table vectors using packed 4-bit indices.
>> These instructions fill the destination vector by copying elements
>> indexed by segments of the source vector, selected by the vector
>> segment index.
>> 
>> The changes include the addition of a new AArch64 option
>> extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
>> for the new LUTI instruction shapes, and implementations of the
>> svluti2 and svluti4 builtins.
>> 
>> New tests are added as well
>> ---
>> gcc/config/aarch64/aarch64-c.cc               |  1 +
>> .../aarch64/aarch64-option-extensions.def     |  2 +
>> .../aarch64/aarch64-sve-builtins-shapes.cc    | 41 +++++++++++++++++
>> .../aarch64/aarch64-sve-builtins-shapes.h     |  2 +
>> .../aarch64/aarch64-sve-builtins-sve2.cc      | 17 +++++++
>> .../aarch64/aarch64-sve-builtins-sve2.def     |  4 ++
>> .../aarch64/aarch64-sve-builtins-sve2.h       |  2 +
>> gcc/config/aarch64/aarch64-sve2.md            | 45 +++++++++++++++++++
>> gcc/config/aarch64/aarch64.h                  |  5 +++
>> gcc/config/aarch64/iterators.md               | 10 +++++
>> .../aarch64/sve/acle/asm/test_sve_acle.h      | 16 ++++++-
>> .../aarch64/sve2/acle/asm/luti2_bf16.c        | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti2_f16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti2_s16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti2_s8.c          | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti2_u16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti2_u8.c          | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti4_bf16.c        | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti4_bf16_x2.c     | 15 +++++++
>> .../aarch64/sve2/acle/asm/luti4_f16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti4_f16_x2.c      | 15 +++++++
>> .../aarch64/sve2/acle/asm/luti4_s16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti4_s16_x2.c      | 15 +++++++
>> .../aarch64/sve2/acle/asm/luti4_s8.c          | 25 +++++++++++
>> .../aarch64/sve2/acle/asm/luti4_u16.c         | 35 +++++++++++++++
>> .../aarch64/sve2/acle/asm/luti4_u16_x2.c      | 15 +++++++
>> .../aarch64/sve2/acle/asm/luti4_u8.c          | 25 +++++++++++
>> gcc/testsuite/lib/target-supports.exp         | 12 +++++
>> 28 files changed, 616 insertions(+), 1 deletion(-)
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c
>> 
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 42ec0eec31e..840f52e08ed 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> 
> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> 
> +AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut")
> +
> 
> I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to 
> use any SME state. I don’t think +lut should be enabling +sme2 for the user
> 
> +;; -------------------------------------------------------------------------
> +;; ---- Table lookup
> +;; -------------------------------------------------------------------------
> +;; Includes:
> +;; - LUTI2
> +;; - LUTI4
> +;; -------------------------------------------------------------------------
> +
> +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
> + [(set (match_operand:SVE_FULL_BS 0 "register_operand" "=w")
> + (unspec:SVE_FULL_BS
> + [(match_operand:SVE_FULL_BS 1 "register_operand" "w")
> + (match_operand:VNx16QI 2 "register_operand" "w")
> + (match_operand:DI 3 "const_int_operand")
> + (const_int LUTI_BITS)]
> + UNSPEC_SVE_LUTI))]
> + "TARGET_SVE2"
> + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]"
> +)
> 
> 
> +
> +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
> + [(set (match_operand:<VSINGLE> 0 "register_operand")
> + (unspec:<VSINGLE>
> + [(match_operand:SVE_FULL_H 1 "aligned_register_operand" "w")
> + (match_operand:VNx16QI 2 "register_operand")
> + (match_operand:DI 3 "const_int_operand")
> + (const_int LUTI_BITS)]
> + UNSPEC_SVE_LUTI))]
> + "TARGET_SVE2"
> + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]"
> +)
> 
> Missing constraints on operands 0 and 3?

I meant operands 0 and 2, of course.

> 
> +
> +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>"
> + [(set (match_operand:<VSINGLE> 0 "register_operand")
> + (unspec:<VSINGLE>
> + [(match_operand:SVE_FULL_Hx2 1 "aligned_register_operand" "Uw2")
> + (match_operand:VNx16QI 2 "register_operand")
> + (match_operand:DI 3 "const_int_operand")
> + (const_int LUTI_BITS)]
> + UNSPEC_SVE_LUTI))]
> + "TARGET_SVE2"
> + "luti<LUTI_BITS>\t%0.<Vetype>, %1, %2[%3]"
> +)
> 
> Likewise.
> 
> Thanks,
> Kyrill


Reply via email to