On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches > wrote: > > > > (define_insn "aesenc" > > > > - [(set (match_operand:V2DI 0 "register_operand" "=x,x") > > > > - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x") > > > > - (match_operand:V2DI 2 "vector_operand" "xBm,xm")] > > > > + [(set (match_operand:V2DI 0 "register_operand" "=x,x,v") > > > > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v") > > > > + (match_operand:V2DI 2 "vector_operand" > > > > + "xBm,xm,vm")] > > > > UNSPEC_AESENC))] > > > > - "TARGET_AES" > > > > + "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)" > > > > "@ > > > > aesenc\t{%2, %0|%0, %2} > > > > + vaesenc\t{%2, %1, %0|%0, %1, %2} > > > > vaesenc\t{%2, %1, %0|%0, %1, %2}" > > > > - [(set_attr "isa" "noavx,avx") > > > > + [(set_attr "isa" "noavx,aes,avx512vl") > > > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES && > > > TARGET_AVX512VL)" from condition. > > > > Since VAES should not imply AES, we need that "|| (TARGET_VAES && > > TARGET_AVX512VL)" > > > > And there is no need to add vaes_avx512vl since the last alternative will > > only > > be hit when there is no aes. When there is no aes, the pattern will need > > vaes > > and avx512vl both or we could not use this pattern. avx512vl here is just > > like > > a placeholder. > > As the following testcase shows, the above change was incorrect. > > Using aes isa for the second alternative is obviously wrong, aes is enabled > whenever -maes is, regardless of -mavx or -mno-avx, so the above change > means that for -maes -mno-avx RA can choose, either it matches the first > alternative with the dup operand, or it matches the second one (but that > is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID). > > The big question is if "Since VAES should not imply AES" is the case or not. > Looking around at what LLVM does on godbolt, seems since clang 6 which added > -mvaes support -mvaes there implies -maes, but GCC treats those two > independent. > > Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes > imply -mno-vaes, then we should probably just revert the above patch and > tweak common/config/i386/ to do the implications (+ add the testcase from > this patch). > > If we keep the current behavior, where AES and VAES are completely > independent extensions, then we need to do more changes as the following > patch attempts to do. > We should use the aesenc etc. insns for noavx as before, we know at that > point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL) > won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX. > For the second alternative, i.e. the AVX AES VEX encoded case, the patch > uses aes_avx isa which requires both. Now, for the third one we can't > use avx512vl isa attribute, because one could compile with > -maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc > which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to > ensure it is enabled only for -mvaes -mavx512vl. And there is another > problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc > which requires AES and AVX ISAs rather than the VAES and AVX512VL which > are enabled. So the patch uses the {evex} prefix for those cases. > And similarly for the vaes*_<mode> instructions, if they aren't 128-bit > or use %xmm16+ registers, the current case is fine, but if they are 128-bit > and use only %xmm0-15 registers, assembler would again emit VEX encoded insn > which needs AES & AVX CPUID, rather than the EVEX encoded ones which need > VAES & AVX512VL CPUIDs. > Still, I wonder if -mvaes shouldn't imply at least -mavx512f and > -mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how > it could use 512-bit registers (this part not done in the patch). > > The following patch has been successfully bootstrapped/regtested on > x86_64-linux and i686-linux. > > 2024-04-04 Jakub Jelinek <ja...@redhat.com> > > PR target/114576 > * config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl. > (enabled): Remove aes isa check, add aes_avx and vaes_avx512vl. > * config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add > 4th alternative, emit {evex} prefix for the third one, use > noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm > rather than m constraint on the 2nd and 3rd alternative input. > (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>, > vaesenclast_<mode>): Add second alternative with x instead of v > and jm instead of m. > > * gcc.target/i386/aes-pr114576.c: New test. > > --- gcc/config/i386/i386.md.jj 2024-03-18 22:15:43.165839479 +0100 > +++ gcc/config/i386/i386.md 2024-04-04 00:48:46.575511556 +0200 > @@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm > > ;; Used to control the "enabled" attribute on a per-instruction basis. > (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx, > - x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd, > + x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd, > sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx, > > avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512, > noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq, > noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni, > avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert, > - avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl" > + avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl, > + aes_avx,vaes_avx512vl" > (const_string "base")) > > ;; The (bounding maximum) length of an instruction immediate. > @@ -915,7 +916,6 @@ (define_attr "enabled" "" > (symbol_ref "TARGET_64BIT && TARGET_AVX512BW") > (eq_attr "isa" "x64_avx512dq") > (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ") > - (eq_attr "isa" "aes") (symbol_ref "TARGET_AES") > (eq_attr "isa" "sse_noavx") > (symbol_ref "TARGET_SSE && !TARGET_AVX") > (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2") > @@ -968,6 +968,10 @@ (define_attr "enabled" "" > (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL") > (eq_attr "isa" "apx_ndd") > (symbol_ref "TARGET_APX_NDD") > + (eq_attr "isa" "aes_avx") > + (symbol_ref "TARGET_AES && TARGET_AVX") > + (eq_attr "isa" "vaes_avx512vl") > + (symbol_ref "TARGET_VAES && TARGET_AVX512VL") > > (eq_attr "mmx_isa" "native") > (symbol_ref "!TARGET_MMX_WITH_SSE") > --- gcc/config/i386/sse.md.jj 2024-03-18 22:15:43.168839437 +0100 > +++ gcc/config/i386/sse.md 2024-04-04 00:58:56.482090689 +0200 > @@ -26277,75 +26277,79 @@ (define_insn "xop_vpermil2<mode>3" > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > (define_insn "aesenc" > - [(set (match_operand:V2DI 0 "register_operand" "=x,x,v") > - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v") > - (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")] > + [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v") > + (match_operand:V2DI 2 "vector_operand" > "xja,xjm,xjm,vm")] > UNSPEC_AESENC))] > "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)" > "@ > aesenc\t{%2, %0|%0, %2} > vaesenc\t{%2, %1, %0|%0, %1, %2} > + %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2} I think we can merge alternative 2 with 3 to * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\"; Then it can handle vaes_avx512vl + -mno-aes case. > vaesenc\t{%2, %1, %0|%0, %1, %2}" > - [(set_attr "isa" "noavx,aes,avx512vl") > + [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl") > (set_attr "type" "sselog1") > - (set_attr "addr" "gpr16,*,*") > + (set_attr "addr" "gpr16,*,*,*") > (set_attr "prefix_extra" "1") > - (set_attr "prefix" "orig,vex,evex") > - (set_attr "btver2_decode" "double,double,double") > + (set_attr "prefix" "orig,vex,evex,evex") > + (set_attr "btver2_decode" "double,double,double,double") > (set_attr "mode" "TI")]) > > (define_insn "aesenclast" > - [(set (match_operand:V2DI 0 "register_operand" "=x,x,v") > - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v") > - (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")] > + [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v") > + (match_operand:V2DI 2 "vector_operand" > "xja,xjm,xjm,vm")] > UNSPEC_AESENCLAST))] > "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)" > "@ > aesenclast\t{%2, %0|%0, %2} > vaesenclast\t{%2, %1, %0|%0, %1, %2} > + %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2} Ditto. > vaesenclast\t{%2, %1, %0|%0, %1, %2}" > - [(set_attr "isa" "noavx,aes,avx512vl") > + [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl") > (set_attr "type" "sselog1") > - (set_attr "addr" "gpr16,*,*") > + (set_attr "addr" "gpr16,*,*,*") > (set_attr "prefix_extra" "1") > - (set_attr "prefix" "orig,vex,evex") > - (set_attr "btver2_decode" "double,double,double") > + (set_attr "prefix" "orig,vex,evex,evex") > + (set_attr "btver2_decode" "double,double,double,double") > (set_attr "mode" "TI")]) > > (define_insn "aesdec" > - [(set (match_operand:V2DI 0 "register_operand" "=x,x,v") > - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v") > - (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")] > + [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v") > + (match_operand:V2DI 2 "vector_operand" > "xja,xjm,xjm,vm")] > UNSPEC_AESDEC))] > "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)" > "@ > aesdec\t{%2, %0|%0, %2} > vaesdec\t{%2, %1, %0|%0, %1, %2} > + %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2} Ditto. > vaesdec\t{%2, %1, %0|%0, %1, %2}" > - [(set_attr "isa" "noavx,aes,avx512vl") > + [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl") > (set_attr "type" "sselog1") > - (set_attr "addr" "gpr16,*,*") > + (set_attr "addr" "gpr16,*,*,*") > (set_attr "prefix_extra" "1") > - (set_attr "prefix" "orig,vex,evex") > - (set_attr "btver2_decode" "double,double,double") > + (set_attr "prefix" "orig,vex,evex,evex") > + (set_attr "btver2_decode" "double,double,double,double") > (set_attr "mode" "TI")]) > > (define_insn "aesdeclast" > - [(set (match_operand:V2DI 0 "register_operand" "=x,x,v") > - (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v") > - (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")] > + [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v") > + (match_operand:V2DI 2 "vector_operand" > "xja,xjm,xjm,vm")] > UNSPEC_AESDECLAST))] > "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)" > "@ > aesdeclast\t{%2, %0|%0, %2} > vaesdeclast\t{%2, %1, %0|%0, %1, %2} > + %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2} Ditto. > vaesdeclast\t{%2, %1, %0|%0, %1, %2}" > - [(set_attr "isa" "noavx,aes,avx512vl") > - (set_attr "addr" "gpr16,*,*") > + [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl") > + (set_attr "addr" "gpr16,*,*,*") > (set_attr "type" "sselog1") > (set_attr "prefix_extra" "1") > - (set_attr "prefix" "orig,vex,evex") > - (set_attr "btver2_decode" "double,double,double") > + (set_attr "prefix" "orig,vex,evex,evex") > + (set_attr "btver2_decode" "double,double,double,double") > (set_attr "mode" "TI")]) > > (define_insn "aesimc" > @@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1" > [(set_attr ("prefix") ("evex"))]) > > (define_insn "vaesdec_<mode>" > - [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v") > + [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v") > (unspec:VI1_AVX512VL_F > - [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v") > - (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")] > + [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v") > + (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")] > UNSPEC_VAESDEC))] > "TARGET_VAES" > - "vaesdec\t{%2, %1, %0|%0, %1, %2}" > -) > +{ > + if (which_alternative == 0 && <MODE>mode == V16QImode) > + return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}"; Similar, but something like * return TARGET_AES || <MODE>mode != V16QImode ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
> + else > + return "vaesdec\t{%2, %1, %0|%0, %1, %2}"; > +}) > > (define_insn "vaesdeclast_<mode>" > - [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v") > + [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v") > (unspec:VI1_AVX512VL_F > - [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v") > - (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")] > + [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v") > + (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")] > UNSPEC_VAESDECLAST))] > "TARGET_VAES" > - "vaesdeclast\t{%2, %1, %0|%0, %1, %2}" > -) > +{ > + if (which_alternative == 0 && <MODE>mode == V16QImode) > + return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}"; Ditto. > + else > + return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"; > +}) > > (define_insn "vaesenc_<mode>" > - [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v") > + [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v") > (unspec:VI1_AVX512VL_F > - [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v") > - (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")] > + [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v") > + (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")] > UNSPEC_VAESENC))] > "TARGET_VAES" > - "vaesenc\t{%2, %1, %0|%0, %1, %2}" > -) > +{ > + if (which_alternative == 0 && <MODE>mode == V16QImode) > + return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}"; Ditto. > + else > + return "vaesenc\t{%2, %1, %0|%0, %1, %2}"; > +}) > > (define_insn "vaesenclast_<mode>" > - [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v") > + [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v") > (unspec:VI1_AVX512VL_F > - [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v") > - (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")] > + [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v") > + (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")] > UNSPEC_VAESENCLAST))] > "TARGET_VAES" > - "vaesenclast\t{%2, %1, %0|%0, %1, %2}" > -) > +{ > + if (which_alternative == 0 && <MODE>mode == V16QImode) > + return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}"; Ditto. > + else > + return "vaesenclast\t{%2, %1, %0|%0, %1, %2}"; > +}) > > (define_insn "vpclmulqdq_<mode>" > [(set (match_operand:VI8_FVL 0 "register_operand" "=v") > --- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj 2024-04-04 > 09:50:17.117757179 +0200 > +++ gcc/testsuite/gcc.target/i386/aes-pr114576.c 2024-04-04 > 09:51:45.211544801 +0200 > @@ -0,0 +1,63 @@ > +/* PR target/114576 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -maes -mno-avx" } */ > +/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */ > +/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */ > +/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */ > +/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */ > +/* { dg-final { scan-assembler-not "\tvaesenc" } } */ > +/* { dg-final { scan-assembler-not "\tvaesdec" } } */ > + > +#include <immintrin.h> > + > +__m128i > +f1 (__m128i x, __m128i y) > +{ > + return _mm_aesenc_si128 (x, y); > +} > + > +__m128i > +f2 (__m128i x, __m128i y) > +{ > + __m128i z = _mm_aesenc_si128 (x, y); > + return z + x + y; > +} > + > +__m128i > +f3 (__m128i x, __m128i y) > +{ > + return _mm_aesdec_si128 (x, y); > +} > + > +__m128i > +f4 (__m128i x, __m128i y) > +{ > + __m128i z = _mm_aesdec_si128 (x, y); > + return z + x + y; > +} > + > +__m128i > +f5 (__m128i x, __m128i y) > +{ > + return _mm_aesenclast_si128 (x, y); > +} > + > +__m128i > +f6 (__m128i x, __m128i y) > +{ > + __m128i z = _mm_aesenclast_si128 (x, y); > + return z + x + y; > +} > + > +__m128i > +f7 (__m128i x, __m128i y) > +{ > + return _mm_aesdeclast_si128 (x, y); > +} > + > +__m128i > +f8 (__m128i x, __m128i y) > +{ > + __m128i z = _mm_aesdeclast_si128 (x, y); > + return z + x + y; > +} > > > Jakub > -- BR, Hongtao