On Sat, Oct 27, 2018 at 8:03 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > Use scalar operand in SF/DF/SI/DI vec_dup patterns which enables combiner > to generate > > (set (reg:V8SF 84) > (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y"))))) > > const_vector_duplicate_operand is added for constant vector broadcast. > We split > > (set (reg:V16SF 86) > (const_vector:V16SF > [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16]) > > to > > (set (reg:V16SF 86) > (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))))
Why not at the expand time? Rewrite vector constant as vec_duplicate from memory and combine will do the stuff for you. We do have _bcst instruction patterns. BTW: We have similar stuff at compress_float_constant. > before IRA so tha IRA can turn > > (set (reg:V16SF 86) > (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1"))))) > (set (reg:V16SF 90) > (plus:V16SF (reg/v:V16SF 85 [ x ]) > (reg:V16SF 86))) > > into > > (set (reg:V16SF 90) > (plus:V16SF > (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))) > (reg/v:V16SF 85 [ x ]))) > > For AVX512 broadcast instructions from integer register operand, we only > need to broadcast integer to integer vectors. > > pic_reg_initialized is added to machine_function to indicate that IRA > has started since *<avx512>_const_vec_dup<mode> is valid only before > IRA. I stopped reading the patch here. Uros. > gcc/ > > PR target/87537 > PR target/87767 > * config/i386/i386-builtin-types.def: Replace > CODE_FOR_avx2_vec_dupv4sf, CODE_FOR_avx2_vec_dupv8sf and > CODE_FOR_avx2_vec_dupv4df with CODE_FOR_vec_dupv4sf, > CODE_FOR_vec_dupv8sf and CODE_FOR_vec_dupv4df, respectively. > * config/i386/i386.c (ix86_init_pic_reg): Set pic_reg_initialized. > (expand_vec_perm_1): Replace gen_avx512f_vec_dupv16sf_1, > gen_avx2_vec_dupv8sf_1 and gen_avx512f_vec_dupv8df_1 with > gen_avx512f_vec_dupv16sf, gen_vec_dupv8sf and > gen_avx512f_vec_dupv8df, respectively. Duplicate them from > scalar operand. > * config/i386/i386.h (machine_function): Add pic_reg_initialized. > * config/i386/i386.md (SF to DF splitter): Replace > gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf. > * config/i386/predicates.md (const_vector_duplicate_operand): New. > * config/i386/sse.md (VF48_AVX512VL): New. > (avx2_vec_dup<mode>): Removed. > (avx2_vec_dupv8sf_1): Likewise. > (avx512f_vec_dup<mode>_1): Likewise. > (avx2_vec_dupv4df): Likewise. > (<avx512>_vec_dup<mode><mask_name>:V48_AVX512VL): Likewise. > (<avx512>_vec_dup<mode><mask_name>:VF48_AVX512VL): New. > (*<avx512>_const_vec_dup<mode>): Likewise. > (<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise. > (<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>): Replace > V48_AVX512VL with VI48_AVX512VL. > (*avx_vperm_broadcast_<mode>): Replace gen_avx2_vec_dupv8sf with > gen_vec_dupv8sf. > > gcc/testsuite/ > > PR target/87537 > PR target/87767 > * gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated. > * gcc.target/i386/avx512vl-vbroadcast-3.c: Likewise. > * gcc.target/i386/avx512-binop-7.h: New file. > * gcc.target/i386/avx512f-add-sf-zmm-7.c: Likewise. > * gcc.target/i386/avx512f-add-si-zmm-7.c: Likewise. > * gcc.target/i386/avx512vl-add-di-xmm-7.c: Likewise. > * gcc.target/i386/avx512vl-add-sf-xmm-7.c: Likewise. > * gcc.target/i386/avx512vl-add-sf-ymm-7.c: Likewise. > * gcc.target/i386/avx512vl-add-si-xmm-7.c: Likewise. > * gcc.target/i386/avx512vl-add-si-ymm-7.c: Likewise. > * gcc.target/i386/pr87537-2.c: Likewise. > * gcc.target/i386/pr87537-3.c: Likewise. > * gcc.target/i386/pr87537-4.c: Likewise. > * gcc.target/i386/pr87537-5.c: Likewise. > * gcc.target/i386/pr87537-6.c: Likewise. > * gcc.target/i386/pr87537-7.c: Likewise. > * gcc.target/i386/pr87537-8.c: Likewise. > * gcc.target/i386/pr87537-9.c: Likewise. > --- > gcc/config/i386/i386-builtin.def | 6 +- > gcc/config/i386/i386.c | 30 +++++- > gcc/config/i386/i386.h | 3 + > gcc/config/i386/i386.md | 2 +- > gcc/config/i386/predicates.md | 13 +++ > gcc/config/i386/sse.md | 98 ++++++++----------- > .../i386/avx2-vbroadcastss_ps256-1.c | 3 +- > .../gcc.target/i386/avx512-binop-7.h | 12 +++ > .../gcc.target/i386/avx512f-add-sf-zmm-7.c | 14 +++ > .../gcc.target/i386/avx512f-add-si-zmm-7.c | 12 +++ > .../gcc.target/i386/avx512vl-add-di-xmm-7.c | 13 +++ > .../gcc.target/i386/avx512vl-add-sf-xmm-7.c | 13 +++ > .../gcc.target/i386/avx512vl-add-sf-ymm-7.c | 13 +++ > .../gcc.target/i386/avx512vl-add-si-xmm-7.c | 13 +++ > .../gcc.target/i386/avx512vl-add-si-ymm-7.c | 13 +++ > .../gcc.target/i386/avx512vl-vbroadcast-3.c | 5 +- > gcc/testsuite/gcc.target/i386/pr87537-2.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-3.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-4.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-5.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-6.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-7.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-8.c | 12 +++ > gcc/testsuite/gcc.target/i386/pr87537-9.c | 12 +++ > 24 files changed, 289 insertions(+), 70 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-7.h > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-5.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-6.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-8.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr87537-9.c > > diff --git a/gcc/config/i386/i386-builtin.def > b/gcc/config/i386/i386-builtin.def > index df0f7e975ac..d217add8ee2 100644 > --- a/gcc/config/i386/i386-builtin.def > +++ b/gcc/config/i386/i386-builtin.def > @@ -1194,9 +1194,9 @@ BDESC (OPTION_MASK_ISA_AVX2, > CODE_FOR_avx2_interleave_lowv16hi, "__builtin_ia32_ > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv8si, > "__builtin_ia32_punpckldq256", IX86_BUILTIN_PUNPCKLDQ256, UNKNOWN, (int) > V8SI_FTYPE_V8SI_V8SI) > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv4di, > "__builtin_ia32_punpcklqdq256", IX86_BUILTIN_PUNPCKLQDQ256, UNKNOWN, (int) > V4DI_FTYPE_V4DI_V4DI) > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_xorv4di3, "__builtin_ia32_pxor256", > IX86_BUILTIN_PXOR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI) > -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4sf, > "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, > (int) V4SF_FTYPE_V4SF) > -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv8sf, > "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, > UNKNOWN, (int) V8SF_FTYPE_V4SF) > -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4df, > "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, > UNKNOWN, (int) V4DF_FTYPE_V2DF) > +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4sf, > "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, > (int) V4SF_FTYPE_V4SF) > +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv8sf, > "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, > UNKNOWN, (int) V8SF_FTYPE_V4SF) > +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4df, > "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, > UNKNOWN, (int) V4DF_FTYPE_V2DF) > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vbroadcasti128_v4di, > "__builtin_ia32_vbroadcastsi256", IX86_BUILTIN_VBROADCASTSI256, UNKNOWN, > (int) V4DI_FTYPE_V2DI) > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv4si, > "__builtin_ia32_pblendd128", IX86_BUILTIN_PBLENDD128, UNKNOWN, (int) > V4SI_FTYPE_V4SI_V4SI_INT) > BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv8si, > "__builtin_ia32_pblendd256", IX86_BUILTIN_PBLENDD256, UNKNOWN, (int) > V8SI_FTYPE_V8SI_V8SI_INT) > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 963c7fcbb34..293a523fe7e 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -6951,6 +6951,8 @@ ix86_init_pic_reg (void) > edge entry_edge; > rtx_insn *seq; > > + cfun->machine->pic_reg_initialized = true; > + > if (!ix86_use_pseudo_pic_reg ()) > return; > > @@ -45963,6 +45965,7 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) > { > /* Use vpbroadcast{b,w,d}. */ > rtx (*gen) (rtx, rtx) = NULL; > + machine_mode scalar_mode = VOIDmode; > switch (d->vmode) > { > case E_V64QImode: > @@ -45993,15 +45996,18 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) > gen = gen_avx2_pbroadcastv8hi; > break; > case E_V16SFmode: > + scalar_mode = SFmode; > if (TARGET_AVX512F) > - gen = gen_avx512f_vec_dupv16sf_1; > + gen = gen_avx512f_vec_dupv16sf; > break; > case E_V8SFmode: > - gen = gen_avx2_vec_dupv8sf_1; > + scalar_mode = SFmode; > + gen = gen_vec_dupv8sf; > break; > case E_V8DFmode: > + scalar_mode = DFmode; > if (TARGET_AVX512F) > - gen = gen_avx512f_vec_dupv8df_1; > + gen = gen_avx512f_vec_dupv8df; > break; > case E_V8DImode: > if (TARGET_AVX512F) > @@ -46013,7 +46019,23 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) > if (gen != NULL) > { > if (!d->testing_p) > - emit_insn (gen (d->target, d->op0)); > + { > + if (scalar_mode == VOIDmode) > + emit_insn (gen (d->target, d->op0)); > + else > + { > + rtx op = d->op0; > + unsigned int oppos = 0; > + if (SUBREG_P (op)) > + { > + op = SUBREG_REG (op); > + oppos = SUBREG_BYTE (op); > + } > + emit_insn (gen (d->target, > + gen_rtx_SUBREG (scalar_mode, > + op, oppos))); > + } > + } > return true; > } > } > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index b0d2f249db7..8880d25d282 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -2744,6 +2744,9 @@ struct GTY(()) machine_function { > /* If true, ENDBR is queued at function entrance. */ > BOOL_BITFIELD endbr_queued_at_entrance : 1; > > + /* If true, PIC register has been initialized. */ > + BOOL_BITFIELD pic_reg_initialized : 1; > + > /* The largest alignment, in bytes, of stack slot actually used. */ > unsigned int max_used_stack_alignment; > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 7fb2b144f47..4a6fa077db5 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -4399,7 +4399,7 @@ > else > { > rtx tmp = lowpart_subreg (V16SFmode, operands[3], V4SFmode); > - emit_insn (gen_avx512f_vec_dupv16sf_1 (tmp, tmp)); > + emit_insn (gen_avx512f_vec_dupv16sf (tmp, tmp)); > } > } > else > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > index bd262d77c6b..1d80de0634f 100644 > --- a/gcc/config/i386/predicates.md > +++ b/gcc/config/i386/predicates.md > @@ -1048,6 +1048,19 @@ > (ior (match_operand 0 "nonimmediate_operand") > (match_code "const_vector"))) > > +;; Return true when OP is CONST_VECTOR which can be represented by > +;; VEC_DUPLICATE. > +(define_predicate "const_vector_duplicate_operand" > + (and (match_code "const_vector") > + (match_test "!standard_sse_constant_p (op, mode)")) > +{ > + int i, nunits = GET_MODE_NUNITS (mode); > + for (i = 1; i < nunits; i++) > + if (CONST_VECTOR_ELT (op, i) != CONST_VECTOR_ELT (op, 0)) > + return false; > + return true; > +}) > + > ;; Return true when OP is nonimmediate or standard SSE constant. > (define_predicate "nonimmediate_or_sse_const_operand" > (ior (match_operand 0 "nonimmediate_operand") > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index ee73e1fdf80..27b0ef7f440 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -304,6 +304,10 @@ > (define_mode_iterator VF_512 > [V16SF V8DF]) > > +(define_mode_iterator VF48_AVX512VL > + [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") > + V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) > + > (define_mode_iterator VI48_AVX512VL > [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") > V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) > @@ -7117,42 +7121,6 @@ > (set_attr "prefix" "orig,maybe_evex") > (set_attr "mode" "SF")]) > > -(define_insn "avx2_vec_dup<mode>" > - [(set (match_operand:VF1_128_256 0 "register_operand" "=v") > - (vec_duplicate:VF1_128_256 > - (vec_select:SF > - (match_operand:V4SF 1 "register_operand" "v") > - (parallel [(const_int 0)]))))] > - "TARGET_AVX2" > - "vbroadcastss\t{%1, %0|%0, %1}" > - [(set_attr "type" "sselog1") > - (set_attr "prefix" "maybe_evex") > - (set_attr "mode" "<MODE>")]) > - > -(define_insn "avx2_vec_dupv8sf_1" > - [(set (match_operand:V8SF 0 "register_operand" "=v") > - (vec_duplicate:V8SF > - (vec_select:SF > - (match_operand:V8SF 1 "register_operand" "v") > - (parallel [(const_int 0)]))))] > - "TARGET_AVX2" > - "vbroadcastss\t{%x1, %0|%0, %x1}" > - [(set_attr "type" "sselog1") > - (set_attr "prefix" "maybe_evex") > - (set_attr "mode" "V8SF")]) > - > -(define_insn "avx512f_vec_dup<mode>_1" > - [(set (match_operand:VF_512 0 "register_operand" "=v") > - (vec_duplicate:VF_512 > - (vec_select:<ssescalarmode> > - (match_operand:VF_512 1 "register_operand" "v") > - (parallel [(const_int 0)]))))] > - "TARGET_AVX512F" > - "vbroadcast<bcstscalarsuff>\t{%x1, %0|%0, %x1}" > - [(set_attr "type" "sselog1") > - (set_attr "prefix" "evex") > - (set_attr "mode" "<MODE>")]) > - > ;; Although insertps takes register source, we prefer > ;; unpcklps with register source since it is shorter. > (define_insn "*vec_concatv2sf_sse4_1" > @@ -18111,18 +18079,6 @@ > (set_attr "prefix" "vex") > (set_attr "mode" "OI")]) > > -(define_insn "avx2_vec_dupv4df" > - [(set (match_operand:V4DF 0 "register_operand" "=v") > - (vec_duplicate:V4DF > - (vec_select:DF > - (match_operand:V2DF 1 "register_operand" "v") > - (parallel [(const_int 0)]))))] > - "TARGET_AVX2" > - "vbroadcastsd\t{%1, %0|%0, %1}" > - [(set_attr "type" "sselog1") > - (set_attr "prefix" "maybe_evex") > - (set_attr "mode" "V4DF")]) > - > (define_insn "<avx512>_vec_dup<mode>_1" > [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v") > (vec_duplicate:VI_AVX512BW > @@ -18138,11 +18094,9 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<avx512>_vec_dup<mode><mask_name>" > - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") > - (vec_duplicate:V48_AVX512VL > - (vec_select:<ssescalarmode> > - (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm") > - (parallel [(const_int 0)]))))] > + [(set (match_operand:VF48_AVX512VL 0 "register_operand" "=v") > + (vec_duplicate:VF48_AVX512VL > + (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm")))] > "TARGET_AVX512F" > { > /* There is no DF broadcast (in AVX-512*) to 128b register. > @@ -18156,6 +18110,34 @@ > (set_attr "prefix" "evex") > (set_attr "mode" "<sseinsnmode>")]) > > +;; NB: This is valid only before IRA. pic_reg_initialized is set at > +;; the IRA entry. > +(define_insn_and_split "*<avx512>_const_vec_dup<mode>" > + [(set (match_operand:V48_AVX512VL 0 "register_operand") > + (match_operand:V48_AVX512VL 1 "const_vector_duplicate_operand"))] > + "TARGET_AVX512F && !cfun->machine->pic_reg_initialized" > + "#" > + "&& 1" > + [(set (match_dup 0) (match_dup 1))] > +{ > + rtx val = CONST_VECTOR_ELT (operands[1], 0); > + machine_mode scalar_mode = GET_MODE_INNER (<MODE>mode); > + val = validize_mem (force_const_mem (scalar_mode, val)); > + operands[1] = gen_rtx_VEC_DUPLICATE (<MODE>mode, val); > +}) > + > +(define_insn "<avx512>_vec_dup<mode><mask_name>" > + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") > + (vec_duplicate:VI48_AVX512VL > + (vec_select:<ssescalarmode> > + (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm") > + (parallel [(const_int 0)]))))] > + "TARGET_AVX512F" > + "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, > %0<mask_operand2>|%0<mask_operand2>, %<iptr>1}" > + [(set_attr "type" "ssemov") > + (set_attr "prefix" "evex") > + (set_attr "mode" "<sseinsnmode>")]) > + > (define_insn "<avx512>_vec_dup<mode><mask_name>" > [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v") > (vec_duplicate:VI12_AVX512VL > @@ -18205,8 +18187,8 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>" > - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v") > - (vec_duplicate:V48_AVX512VL > + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v") > + (vec_duplicate:VI48_AVX512VL > (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))] > "TARGET_AVX512F" > "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, > %0<mask_operand2>|%0<mask_operand2>, %1}" > @@ -18215,8 +18197,7 @@ > (set_attr "mode" "<sseinsnmode>") > (set (attr "enabled") > (if_then_else (eq_attr "alternative" "1") > - (symbol_ref "GET_MODE_CLASS (<ssescalarmode>mode) == MODE_INT > - && (<ssescalarmode>mode != DImode || TARGET_64BIT)") > + (symbol_ref "<ssescalarmode>mode != DImode || TARGET_64BIT") > (const_int 1)))]) > > (define_insn "vec_dupv4sf" > @@ -18545,8 +18526,7 @@ > or VSHUFF128. */ > gcc_assert (<MODE>mode == V8SFmode); > if ((mask & 1) == 0) > - emit_insn (gen_avx2_vec_dupv8sf (op0, > - gen_lowpart (V4SFmode, op0))); > + emit_insn (gen_vec_dupv8sf (op0, gen_lowpart (V4SFmode, op0))); > else > emit_insn (gen_avx512vl_shuf_f32x4_1 (op0, op0, op0, > GEN_INT (4), GEN_INT (5), > diff --git a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c > b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c > index dfac3916b08..3ff7497aa21 100644 > --- a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c > +++ b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c > @@ -1,6 +1,7 @@ > /* { dg-do compile } */ > /* { dg-options "-mavx2 -O2" } */ > -/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } > } */ > +/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } > } */ > +/* { dg-final { scan-assembler-not "vmovaps\[\t \]*\[^,\]*,%xmm\[0-9\]" } } > */ > > #include <immintrin.h> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-binop-7.h > b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h > new file mode 100644 > index 00000000000..513901847a9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512-binop-7.h > @@ -0,0 +1,12 @@ > +#include <immintrin.h> > + > +#define PASTER2(x,y) x##y > +#define PASTER3(x,y,z) _mm##x##_##y##_##z > +#define OP(vec, op, suffix) PASTER3 (vec, op, suffix) > +#define DUP(vec, suffix, val) PASTER3 (vec, set1, suffix) (val) > + > +type > +foo (type x) > +{ > + return OP (vec, op, op_suffix) (DUP (vec, dup_suffix, 2.1f), x); > +} > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c > b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c > new file mode 100644 > index 00000000000..de23c73e71c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-7.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vaddps\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ > +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ > + > +#define type __m512 > +#define vec 512 > +#define op add > +#define op_suffix ps > +#define dup_suffix ps > +#define SCALAR float > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c > b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c > new file mode 100644 > index 00000000000..9e5f800118d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-7.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vpaddd\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > + > +#define type __m512i > +#define vec 512 > +#define op add > +#define op_suffix epi32 > +#define dup_suffix epi32 > +#define SCALAR int > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c > b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c > new file mode 100644 > index 00000000000..7d921aded31 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-di-xmm-7.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpaddq\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ > +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ > + > +#define type __m128i > +#define vec > +#define op add > +#define op_suffix epi64 > +#define dup_suffix epi64x > +#define SCALAR int > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c > b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c > new file mode 100644 > index 00000000000..2fc1d5c4824 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-7.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vaddps\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ > +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ > + > +#define type __m128 > +#define vec > +#define op add > +#define op_suffix ps > +#define dup_suffix ps > +#define SCALAR float > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c > b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c > new file mode 100644 > index 00000000000..436aae757ca > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-7.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vaddps\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */ > +/* { dg-final { scan-assembler-times "long\[ \\t\]+1074161254" 1 } } */ > + > +#define type __m256 > +#define vec 256 > +#define op add > +#define op_suffix ps > +#define dup_suffix ps > +#define SCALAR float > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c > b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c > new file mode 100644 > index 00000000000..0bd7a0c5e96 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-7.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpaddd\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ > +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ > + > +#define type __m128i > +#define vec > +#define op add > +#define op_suffix epi32 > +#define dup_suffix epi32 > +#define SCALAR int > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c > b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c > new file mode 100644 > index 00000000000..fdde09fca1e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-7.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpaddd\[ > \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */ > +/* { dg-final { scan-assembler-times "(?:long|quad)\[ \\t\]+2" 1 } } */ > + > +#define type __m256i > +#define vec 256 > +#define op add > +#define op_suffix epi32 > +#define dup_suffix epi32 > +#define SCALAR int > + > +#include "avx512-binop-7.h" > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c > b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c > index 7233398cd64..1c62364dac4 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c > @@ -151,8 +151,8 @@ f16 (V2 *x) > } > > /* { dg-final { scan-assembler-times > "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */ > -/* { dg-final { scan-assembler-times > "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */ > -/* { dg-final { scan-assembler-times > "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */ > +/* { dg-final { scan-assembler-times > "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 1 } } */ > +/* { dg-final { scan-assembler-times > "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 4 } } */ > /* { dg-final { scan-assembler-times > "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ > /* { dg-final { scan-assembler-times > "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ > /* { dg-final { scan-assembler-times > "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ > @@ -160,3 +160,4 @@ f16 (V2 *x) > /* { dg-final { scan-assembler-times > "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */ > /* { dg-final { scan-assembler-times > "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */ > /* { dg-final { scan-assembler-times > "vshuff32x4\[^\n\r]*\\\$3\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } > */ > +/* { dg-final { scan-assembler-times > "vshuff32x4\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } > */ > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-2.c > b/gcc/testsuite/gcc.target/i386/pr87537-2.c > new file mode 100644 > index 00000000000..19ded7e64b2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m512 > +foo (float *x) > +{ > + return _mm512_broadcastss_ps (_mm_load_ss(x)); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-3.c > b/gcc/testsuite/gcc.target/i386/pr87537-3.c > new file mode 100644 > index 00000000000..ee7781a69e4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-3.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m512 > +foo (void) > +{ > + return _mm512_set1_ps (2.0f); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-4.c > b/gcc/testsuite/gcc.target/i386/pr87537-4.c > new file mode 100644 > index 00000000000..c5bfef1366e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-4.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastsd\[ > \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovsd" } } */ > + > +#include <immintrin.h> > + > +__m512d > +foo (double *x) > +{ > + return _mm512_broadcastsd_pd (_mm_load_sd(x)); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-5.c > b/gcc/testsuite/gcc.target/i386/pr87537-5.c > new file mode 100644 > index 00000000000..4f806f4fbf3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-5.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastsd\[ > \\t\]+\[^\{\n\]*%zmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovsd" } } */ > + > +#include <immintrin.h> > + > +__m512d > +foo (void) > +{ > + return _mm512_set1_pd (2.0f); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-6.c > b/gcc/testsuite/gcc.target/i386/pr87537-6.c > new file mode 100644 > index 00000000000..b53588b907b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-6.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m256 > +foo (float *x) > +{ > + return _mm256_broadcastss_ps (_mm_load_ss(x)); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-7.c > b/gcc/testsuite/gcc.target/i386/pr87537-7.c > new file mode 100644 > index 00000000000..d07a1e3de55 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-7.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%ymm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m256 > +foo (void) > +{ > + return _mm256_set1_ps (2.0f); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-8.c > b/gcc/testsuite/gcc.target/i386/pr87537-8.c > new file mode 100644 > index 00000000000..dbf4ee3551d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-8.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m128 > +foo (float *x) > +{ > + return _mm_broadcastss_ps (_mm_load_ss(x)); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr87537-9.c > b/gcc/testsuite/gcc.target/i386/pr87537-9.c > new file mode 100644 > index 00000000000..8e09382d876 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr87537-9.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2 -mtune=skylake" } */ > +/* { dg-final { scan-assembler-times "vbroadcastss\[ > \\t\]+\[^\{\n\]*%xmm\[0-9\]" 1 } } */ > +/* { dg-final { scan-assembler-not "vmovss" } } */ > + > +#include <immintrin.h> > + > +__m128 > +foo (void) > +{ > + return _mm_set1_ps (2.0f); > +} > -- > 2.17.2 >