On Wed, May 24, 2023 at 3:58 PM Jan Beulich via Gcc <gcc@gcc.gnu.org> wrote: > > Hello, > > for a couple of years I was meaning to extend the use of these AVX512F > insns beyond the pretty minimalistic ones there are so far. Now that I've > got around to at least draft something, I ran into a couple of issues I > cannot explain. I'd like to start with understanding the unexpected > effects of a change to an existing insn I have made (reproduced at the > bottom). I certainly was prepared to observe testsuite failures, but it > ends up failing tests I didn't expect it would fail, and - upon looking > at sibling ones - also ends up leaving intact tests which I would expect > would then need adjustment (because of using the new alternative). > > In particular (all mentioned tests are in gcc.target/i386/) > - avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because > for whatever reason generated code reverts back to using vpbroadcastd, > - avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use > vpandnq with embedded broadcast), > - avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when > at the same time a made-up DI variant of the test (akin to what might > be an avx512f-andn-di-zmm-2.c testcase) does. > IOW: How is SI mode element size different here from DI mode one? Is > there anything wrong with the 4th alternative I'm adding, or is this > hinting at some anomaly elsewhere? __m512i is defined as __v8di, when it's used for _mm512_andnot_epi32, it's explicitlt converted to (__v16si) and creates an extra subreg which is not needed for DImode cases. And pass_combine try to match the below pattern but failed due to the condition REG_P (operands[1]) || REG_P (operands[2]). Here I think you want register_operand instead of REG_P. 157(set (reg:V16SI 91) 158 (and:V16SI (not:V16SI (subreg:V16SI (reg:V8DI 98) 0)) 159 (vec_duplicate:V16SI (mem:SI (reg:DI 99) [1 *f_3(D)+0 S4 A32]))))
> > Just to mention it, avx512f-andn-si-zmm-5.c similarly fails > unexpectedly, but I guess for the same reason (and there aren't AVX512VL > or DI mode element counterparts thereof). > > Jan > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17019,11 +17019,11 @@ > "TARGET_AVX512F") > > (define_insn "*andnot<mode>3" > - [(set (match_operand:VI 0 "register_operand" "=x,x,v") > + [(set (match_operand:VI 0 "register_operand" "=x,x,v,v") > (and:VI > - (not:VI (match_operand:VI 1 "vector_operand" "0,x,v")) > - (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))] > - "TARGET_SSE" > + (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,mBr")) > + (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v")))] > + "TARGET_SSE && (REG_P (operands[1]) || REG_P (operands[2]))" > { > char buf[64]; > const char *ops; > @@ -17090,6 +17090,11 @@ > case 2: > ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; > break; > + case 3: > + tmp = "pternlog"; > + ssesuffix = "<ternlogsuffix>"; > + ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}"; > + break; > default: > gcc_unreachable (); > } > @@ -17098,7 +17103,7 @@ > output_asm_insn (buf, operands); > return ""; > } > - [(set_attr "isa" "noavx,avx,avx") > + [(set_attr "isa" "noavx,avx,avx,avx512f") > (set_attr "type" "sselog") > (set (attr "prefix_data16") > (if_then_else > @@ -17106,7 +17111,7 @@ > (eq_attr "mode" "TI")) > (const_string "1") > (const_string "*"))) > - (set_attr "prefix" "orig,vex,evex") > + (set_attr "prefix" "orig,vex,evex,evex") > (set (attr "mode") > (cond [(match_test "TARGET_AVX2") > (const_string "<sseinsnmode>") > @@ -17119,7 +17124,11 @@ > (match_test "optimize_function_for_size_p (cfun)")) > (const_string "V4SF") > ] > - (const_string "<sseinsnmode>")))]) > + (const_string "<sseinsnmode>"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "3") > + (symbol_ref "<MODE_SIZE> == 64 ? TARGET_AVX512F : > TARGET_AVX512VL") > + (const_string "*")))]) > > ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn > (define_split -- BR, Hongtao