> On 11 Jul 2024, at 09:18, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > External email: Use caution opening links or attachments > > > Hi Vladimir, > >> On 10 Jul 2024, at 15:34, vladimir.miloser...@arm.com wrote: >> >> External email: Use caution opening links or attachments >> >> >> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2. >> >> LUTI instructions are used for efficient table lookups with 2-bit >> or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from >> the low 128 bits of the table vector using packed 2-bit indices, >> while LUTI4 can read from the low 128 or 256 bits of the table >> vector or from two table vectors using packed 4-bit indices. >> These instructions fill the destination vector by copying elements >> indexed by segments of the source vector, selected by the vector >> segment index. >> >> The changes include the addition of a new AArch64 option >> extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions >> for the new LUTI instruction shapes, and implementations of the >> svluti2 and svluti4 builtins. >> >> New tests are added as well >> --- >> gcc/config/aarch64/aarch64-c.cc | 1 + >> .../aarch64/aarch64-option-extensions.def | 2 + >> .../aarch64/aarch64-sve-builtins-shapes.cc | 41 +++++++++++++++++ >> .../aarch64/aarch64-sve-builtins-shapes.h | 2 + >> .../aarch64/aarch64-sve-builtins-sve2.cc | 17 +++++++ >> .../aarch64/aarch64-sve-builtins-sve2.def | 4 ++ >> .../aarch64/aarch64-sve-builtins-sve2.h | 2 + >> gcc/config/aarch64/aarch64-sve2.md | 45 +++++++++++++++++++ >> gcc/config/aarch64/aarch64.h | 5 +++ >> gcc/config/aarch64/iterators.md | 10 +++++ >> .../aarch64/sve/acle/asm/test_sve_acle.h | 16 ++++++- >> .../aarch64/sve2/acle/asm/luti2_bf16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti2_f16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti2_s16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti2_s8.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti2_u16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti2_u8.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti4_bf16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti4_bf16_x2.c | 15 +++++++ >> .../aarch64/sve2/acle/asm/luti4_f16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti4_f16_x2.c | 15 +++++++ >> .../aarch64/sve2/acle/asm/luti4_s16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti4_s16_x2.c | 15 +++++++ >> .../aarch64/sve2/acle/asm/luti4_s8.c | 25 +++++++++++ >> .../aarch64/sve2/acle/asm/luti4_u16.c | 35 +++++++++++++++ >> .../aarch64/sve2/acle/asm/luti4_u16_x2.c | 15 +++++++ >> .../aarch64/sve2/acle/asm/luti4_u8.c | 25 +++++++++++ >> gcc/testsuite/lib/target-supports.exp | 12 +++++ >> 28 files changed, 616 insertions(+), 1 deletion(-) >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_f16.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s16.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_s8.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u16.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti2_u8.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_s8.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/luti4_u8.c >> > > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index 42ec0eec31e..840f52e08ed 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the") > > AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") > > +AARCH64_OPT_EXTENSION("lut", LUT, (SVE2, SME2), (), (), "lut") > + > > I think the LUT extension doesn’t require SME2, does it? It doesn’t seem to > use any SME state. I don’t think +lut should be enabling +sme2 for the user > > +;; ------------------------------------------------------------------------- > +;; ---- Table lookup > +;; ------------------------------------------------------------------------- > +;; Includes: > +;; - LUTI2 > +;; - LUTI4 > +;; ------------------------------------------------------------------------- > + > +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" > + [(set (match_operand:SVE_FULL_BS 0 "register_operand" "=w") > + (unspec:SVE_FULL_BS > + [(match_operand:SVE_FULL_BS 1 "register_operand" "w") > + (match_operand:VNx16QI 2 "register_operand" "w") > + (match_operand:DI 3 "const_int_operand") > + (const_int LUTI_BITS)] > + UNSPEC_SVE_LUTI))] > + "TARGET_SVE2" > + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]" > +) > > > + > +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" > + [(set (match_operand:<VSINGLE> 0 "register_operand") > + (unspec:<VSINGLE> > + [(match_operand:SVE_FULL_H 1 "aligned_register_operand" "w") > + (match_operand:VNx16QI 2 "register_operand") > + (match_operand:DI 3 "const_int_operand") > + (const_int LUTI_BITS)] > + UNSPEC_SVE_LUTI))] > + "TARGET_SVE2" > + "luti<LUTI_BITS>\t%0.<Vetype>, { %1.<Vetype> }, %2[%3]" > +) > > Missing constraints on operands 0 and 3?
I meant operands 0 and 2, of course. > > + > +(define_insn "@aarch64_sve_luti<LUTI_BITS><mode>" > + [(set (match_operand:<VSINGLE> 0 "register_operand") > + (unspec:<VSINGLE> > + [(match_operand:SVE_FULL_Hx2 1 "aligned_register_operand" "Uw2") > + (match_operand:VNx16QI 2 "register_operand") > + (match_operand:DI 3 "const_int_operand") > + (const_int LUTI_BITS)] > + UNSPEC_SVE_LUTI))] > + "TARGET_SVE2" > + "luti<LUTI_BITS>\t%0.<Vetype>, %1, %2[%3]" > +) > > Likewise. > > Thanks, > Kyrill