On Thu, Oct 18, 2018 at 1:54 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On 10/18/18, Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Oct 18, 2018 at 1:19 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > >> >> +(define_insn > >> >> "*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1" > >> >> + [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v") > >> >> + (fma:VF_AVX512 > >> >> + (match_operand:VF_AVX512 1 "nonimmediate_operand" "0,v") > >> >> + (match_operand:VF_AVX512 2 "nonimmediate_operand" "v,0") > >> >> + (vec_duplicate:VF_AVX512 > >> >> + (match_operand:<ssescalarmode> 3 "nonimmediate_operand" > >> >> "m,m"))))] > >> > > >> > Please note that having "nonimmediate_operand" predicate with "m" > >> > constraint will force scalar value that lives in any register to > >> > memory. So, scalar value will be pushed from either integer or SSE > >> > register to memory, and will be broadcast to SSE register from here. I > >> > guess this is not the optimal way, and we still want (eventual movq > >> > from integer reg) + broadcast insn in this case. > >> > > >> > If this predicate is changed to "memory_operand", then only scalars > >> > that live in memory will be considered. > >> > >> Using "memory_operand" causes: > >> > >> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-times > >> vfmadd...ps[ \\t]+[^\n\r]+\\{1to[1-8]+\\}, %zmm[0-9]+, %zmm0 1 > >> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-not > >> vbroadcastss[^\n]*%zmm[0-9]+ > >> > >> __m512 > >> foo (__m512 x, __m512 y) > >> { > >> return _mm512_fmadd_ps (x, y, _mm512_set1_ps (2.f)); > >> } > >> > >> Combiner: > >> > >> Failed to match this instruction: > >> (set (reg:V16SF 91) > >> (fma:V16SF (reg/v:V16SF 85 [ x ]) > >> (reg:V16SF 21 xmm1 [ y ]) > >> (vec_duplicate:V16SF (reg:SF 88)))) > > > > This is expected, there is no memory operand there. Can you check what > > prevents combiner from propagating memory into the insn? > > Failed to match this instruction: > (set (reg:V16SF 91) > (fma:V16SF (reg/v:V16SF 85 [ x ]) > (reg:V16SF 21 xmm1 [ y ]) > (const_vector:V16SF [ > (const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16 > ]))) > > I don't know if we want to fix it. > > Here is the updated patch with "memory_operand". OK for trunk?
LGTM. Thanks, Uros.