On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Introduce a new alternative permitting all 32 registers to be used as > source without AVX512VL, by broadcasting to the full 512 bits in that > case. (The insn would also permit all registers to be used as > destination, but V2DFmode doesn't.) The patch looks technically ok, but considering we don't have a real CPU with only AVX512F but no AVX512VL, these optimisations for AVX512F only don't make much sense, but rather increase the burden for maintenance. (For now, AVX512VL+AVX512CD+AVX512BW+AVX512DQ is a base set after skylake-avx512, users are more likely to use -march=$PROCESSOR for avx512.) For this(and those previous AVX512F only) optimised patch, I think it's helpful to help understand the pattern, so I'll approve of this patch. But I hope we don't spend too much time on such optimisations (unless there is an AVX512F only processor). > > gcc/ > > * config/i386/sse.md (vec_dupv2df<mask_name>): Add new AVX512F > alternative. Move AVX512VL part of condition to new "enabled" > attribute. > --- > Because of the V2DF restriction, in principle the new source constraint > could also omit 'm'. > > Can't the latter two of the original alternatives be folded, by using > Yvm instead of xm/vm? I think yes. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -13761,18 +13761,27 @@ > (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")]) > > (define_insn "vec_dupv2df<mask_name>" > - [(set (match_operand:V2DF 0 "register_operand" "=x,x,v") > + [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,v") > (vec_duplicate:V2DF > - (match_operand:DF 1 "nonimmediate_operand" " 0,xm,vm")))] > - "TARGET_SSE2 && <mask_avx512vl_condition>" > + (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))] > + "TARGET_SSE2" > "@ > unpcklpd\t%0, %0 > %vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1} > - vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}" > - [(set_attr "isa" "noavx,sse3,avx512vl") > - (set_attr "type" "sselog1") > - (set_attr "prefix" "orig,maybe_vex,evex") > - (set_attr "mode" "V2DF,DF,DF")]) > + vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1} > + vbroadcastsd\t{%1, }%g0<mask_operand2>{|, %1}" > + [(set_attr "isa" "noavx,sse3,avx512vl,*") > + (set_attr "type" "sselog1,ssemov,ssemov,ssemov") > + (set_attr "prefix" "orig,maybe_vex,evex,evex") > + (set_attr "mode" "V2DF,DF,DF,V8DF") > + (set (attr "enabled") > + (cond [(eq_attr "alternative" "3") > + (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL > + && !TARGET_PREFER_AVX256") > + (match_test "<mask_avx512vl_condition>") > + (const_string "*") > + ] > + (symbol_ref "false")))]) > > (define_insn "vec_concatv2df" > [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,x,x, v,x,x")
-- BR, Hongtao