https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What I mean is that we should try to simplify the md file, instead of adding
hundreds of new *_bcst patterns.
We have e.g.
(define_insn "*<plusminus_insn><mode>3"
[(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
(plusminus:VI_AVX2
(match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
(match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
"TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"@
p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "<sseinsnmode>")])
(define_insn "*sub<mode>3_bcst"
[(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
(minus:VI48_AVX512VL
(match_operand:VI48_AVX512VL 1 "register_operand" "v")
(vec_duplicate:VI48_AVX512VL
(match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
"TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
"vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
[(set_attr "type" "sseiadd")
(set_attr "prefix" "evex")
(set_attr "mode" "<sseinsnmode>")])
What I meant is we could have just:
(define_insn "*<plusminus_insn><mode>3"
[(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
(plusminus:VI_AVX2
(match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
(match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
"TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"@
p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "<sseinsnmode>")])
where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
a VEC_DUPLICATE of the right mode with a MEM inside of it with the element mode
of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one MEM
operand, or two VEC_DUPLICATE operands) and that the output code would handle
emitting an operand with VEC_DUPLICATE of a MEM properly.
Or perhaps the constraint there could be just for the broadcast and one could
write vmBb. Still, I think the predicate needs to be accurate, i.e. for some
instructions we want e.g. vector_operand or TARGET_AVX512F and
bcst_mem_operand,
for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
Anyway, if we go down this route, might be best to handle just a couple of
patterns, then ask for review and see what Kirill (or if Uros would be
interested) think about it and only later convert more.