On Fri, Mar 12, 2021 at 5:11 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Fri, Mar 12, 2021 at 4:28 PM Jakub Jelinek <ja...@redhat.com> wrote: > > > > On Fri, Mar 12, 2021 at 03:34:09PM +0100, Uros Bizjak wrote: > > > > (define_insn "*avx2_pmaddwd" > > > > - [(set (match_operand:V8SI 0 "register_operand" "=x,v") > > > > + [(set (match_operand:V8SI 0 "register_operand" "=Yw") > > > > > > I'm not sure contraction like this is correct. The prolbem is with vex > > > vs. evex prefix, used in length calculation. When XMM16+ is used in > > > the insn, and evex prefix is used, the unpatched version returns vex > > > for alternative 0 due to (x,x,x) and evex for alternative 1, since one > > > of registers satisfies only "v". > > > > > > Patched version now always emits vex, which is wrong for XMM16+ registers. > > > > That is true, but we have many thousands of those cases, where we just > > use vex or maybe_vex or maybe_evex prefix with v or Yv or Yw or YW etc. > > Adding extra alternatives for that would mean changing pretty much all of > > sse.md. > > I think it is much easier to imply length_evex when prefix is vex/maybe_vex > > when > > any of the operands is EXT_REX_SSE_REG_P, subreg thereof, > > MASK_REG_P, or subreg thereof. > > > > The default definition of the "length" attribute has: > > (ior (eq_attr "prefix" "evex") > > (and (ior (eq_attr "prefix" "maybe_evex") > > (eq_attr "prefix" "maybe_vex")) > > (match_test "TARGET_AVX512F"))) > > (plus (attr "length_evex") > > (plus (attr "length_immediate") > > (plus (attr "modrm") > > (attr "length_address")))) > > (ior (eq_attr "prefix" "vex") > > (and (ior (eq_attr "prefix" "maybe_vex") > > (eq_attr "prefix" "maybe_evex")) > > (match_test "TARGET_AVX"))) > > (plus (attr "length_vex") > > (plus (attr "length_immediate") > > (plus (attr "modrm") > > (attr "length_address"))))] > > That is just extremely rough guess, assuming all insns with > > evex/maybe_evex/maybe_vex prefix will be EVEX encoded when TARGET_AVX512F > > is clearly wrong, that is only true for instructions that don't have > > a VEX counterpart (e.g. if they have mnemonics that is EVEX only), otherwise > > it depends on whether either the operands will be 64-byte (we can perhaps > > use for that the mode attribute at least by default) or whether any of the > > operands is %[xy]mm16+ or %k*. > > So (but I think this must be GCC 12 material) I'd say we should throw away > > maybe_evex prefix altogether (replace with maybe_vex or vex), > > use evex for the cases where it has EVEX only mnemonics and otherwise > > call some function to look at the operands and mode attribute. > > Yes, I'm aware that while great care was taken to handle vex > attribute, evex handling is quite sloppy, and I fully agree with your > findings. I have noticed the issue when I tried to utilize newly > introduced YW constraint some more, e.g.: > > (define_insn "*vec_extract<mode>" > [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" > "=r,m,r,m") > (vec_select:<ssescalarmode> > (match_operand:PEXTR_MODE12 1 "register_operand" "x,x,v,v") > (parallel > [(match_operand:SI 2 "const_0_to_<ssescalarnummask>_operand")])))] > "TARGET_SSE2" > "@ > %vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2} > %vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2} > vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2} > vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "isa" "*,sse4,avx512bw,avx512bw") > > where alternatives 0/2 and 1/3 can now be merged together using YW > register constraint (plus a couple of other places, just grep for > avx512bw isa attribute). I was not sure if using maybe_vex is the > correct selection for the prefix attribute in this case.
Untested patch that introduces YW to some remaining pextr instructions, fixes one case of 128bit vpsrldq and 128bit vpalignr w/o AVX512VL. Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2cd8e04b913..43e4d57ec6a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15483,18 +15483,16 @@ [(V16QI "TARGET_SSE4_1") V8HI]) (define_insn "*vec_extract<mode>" - [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=r,m,r,m") + [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=r,m") (vec_select:<ssescalarmode> - (match_operand:PEXTR_MODE12 1 "register_operand" "x,x,v,v") + (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW") (parallel [(match_operand:SI 2 "const_0_to_<ssescalarnummask>_operand")])))] "TARGET_SSE2" "@ %vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2} - %vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2} - vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2} - vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "*,sse4,avx512bw,avx512bw") + %vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "*,sse4") (set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set (attr "prefix_extra") @@ -15504,23 +15502,20 @@ (const_string "*") (const_string "1"))) (set_attr "length_immediate" "1") - (set_attr "prefix" "maybe_vex,maybe_vex,evex,evex") + (set_attr "prefix" "maybe_vex,maybe_vex") (set_attr "mode" "TI")]) (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext" - [(set (match_operand:SWI48 0 "register_operand" "=r,r") + [(set (match_operand:SWI48 0 "register_operand" "=r") (zero_extend:SWI48 (vec_select:<PEXTR_MODE12:ssescalarmode> - (match_operand:PEXTR_MODE12 1 "register_operand" "x,v") + (match_operand:PEXTR_MODE12 1 "register_operand" "YW") (parallel [(match_operand:SI 2 "const_0_to_<PEXTR_MODE12:ssescalarnummask>_operand")]))))] "TARGET_SSE2" - "@ - %vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2} - vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "isa" "*,avx512bw") - (set_attr "type" "sselog1") + "%vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set (attr "prefix_extra") (if_then_else @@ -15532,18 +15527,15 @@ (set_attr "mode" "TI")]) (define_insn "*vec_extractv16qi_zext" - [(set (match_operand:HI 0 "register_operand" "=r,r") + [(set (match_operand:HI 0 "register_operand" "=r") (zero_extend:HI (vec_select:QI - (match_operand:V16QI 1 "register_operand" "x,v") + (match_operand:V16QI 1 "register_operand" "YW") (parallel [(match_operand:SI 2 "const_0_to_15_operand")]))))] "TARGET_SSE4_1" - "@ - %vpextrb\t{%2, %1, %k0|%k0, %1, %2} - vpextrb\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "isa" "*,avx512bw") - (set_attr "type" "sselog1") + "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") @@ -15650,9 +15642,9 @@ "operands[1] = gen_lowpart (SImode, operands[1]);") (define_insn "*vec_extractv4si" - [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm,Yr,*x,x,Yv") + [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm,Yr,*x,Yw") (vec_select:SI - (match_operand:V4SI 1 "register_operand" "x,v,0,0,x,v") + (match_operand:V4SI 1 "register_operand" " x, v, 0, 0,Yw") (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))] "TARGET_SSE4_1" { @@ -15668,7 +15660,6 @@ return "psrldq\t{%2, %0|%0, %2}"; case 4: - case 5: operands[2] = GEN_INT (INTVAL (operands[2]) * 4); return "vpsrldq\t{%2, %1, %0|%0, %1, %2}"; @@ -15676,14 +15667,14 @@ gcc_unreachable (); } } - [(set_attr "isa" "*,avx512dq,noavx,noavx,avx,avx512bw") - (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1,sseishft1,sseishft1") + [(set_attr "isa" "*,avx512dq,noavx,noavx,avx") + (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1,sseishft1") (set (attr "prefix_extra") (if_then_else (eq_attr "alternative" "0,1") (const_string "1") (const_string "*"))) (set_attr "length_immediate" "1") - (set_attr "prefix" "maybe_vex,evex,orig,orig,vex,evex") + (set_attr "prefix" "maybe_vex,evex,orig,orig,maybe_vex") (set_attr "mode" "TI")]) (define_insn "*vec_extractv4si_zext" @@ -21599,11 +21590,11 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "*ssse3_palignr<mode>_perm" - [(set (match_operand:V_128 0 "register_operand" "=x,x,v") + [(set (match_operand:V_128 0 "register_operand" "=x,Yw") (vec_select:V_128 - (match_operand:V_128 1 "register_operand" "0,x,v") + (match_operand:V_128 1 "register_operand" "0,Yw") (match_parallel 2 "palignr_operand" - [(match_operand 3 "const_int_operand" "n,n,n")])))] + [(match_operand 3 "const_int_operand" "n,n")])))] "TARGET_SSSE3" { operands[2] = (GEN_INT (INTVAL (operands[3]) @@ -21614,19 +21605,18 @@ case 0: return "palignr\t{%2, %1, %0|%0, %1, %2}"; case 1: - case 2: return "vpalignr\t{%2, %1, %1, %0|%0, %1, %1, %2}"; default: gcc_unreachable (); } } - [(set_attr "isa" "noavx,avx,avx512bw") + [(set_attr "isa" "noavx,avx") (set_attr "type" "sseishft") (set_attr "atom_unit" "sishuf") - (set_attr "prefix_data16" "1,*,*") + (set_attr "prefix_data16" "1,*") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "prefix" "orig,vex,evex")]) + (set_attr "prefix" "orig,maybe_evex")]) (define_expand "avx512vl_vinsert<mode>" [(match_operand:VI48F_256 0 "register_operand")