On 10/09/2013 03:31 AM, Kirill Yukhin wrote: > + rtx op1 = operands[1]; > + if (REG_P (op1)) > + op1 = gen_rtx_REG (V16HImode, REGNO (op1)); > + else > + op1 = gen_lowpart (V16HImode, op1);
The IF case is incorrect. You need to use gen_lowpart always. > +(define_insn "*avx512f_unpcklpd512" > + [(set (match_operand:V8DF 0 "register_operand" "=v,v") > + (vec_select:V8DF > + (vec_concat:V16DF > + (match_operand:V8DF 1 "nonimmediate_operand" "v,vm") > + (match_operand:V8DF 2 "nonimmediate_operand" "vm,1")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 2) (const_int 10) > + (const_int 4) (const_int 12) > + (const_int 6) (const_int 14)])))] > + "TARGET_AVX512F" > + "@ > + vunpcklpd\t{%2, %1, %0|%0, %1, %2} > + vmovddup\t{%1, %0|%0, %1}" > + [(set_attr "type" "sselog") > + (set_attr "prefix" "evex") > + (set_attr "mode" "V8DF")]) The second alternative only matches for v/m/1. While I imagine that it doesn't really matter, it might be better to swap the two so that vmovddup gets used for the v/v/1 case too. Why do you have separate define_expand and define_insn for this pattern? > +(define_insn "*avx512f_<code>v8div16qi2_store" > + [(set (match_operand:V16QI 0 "memory_operand" "=m") > + (vec_concat:V16QI > + (any_truncate:V8QI > + (match_operand:V8DI 1 "register_operand" "v")) > + (vec_select:V8QI > + (match_dup 0) > + (parallel [(const_int 8) (const_int 9) > + (const_int 10) (const_int 11) > + (const_int 12) (const_int 13) > + (const_int 14) (const_int 15)]))))] > + "TARGET_AVX512F" > + "vpmov<trunsuffix>qb\t{%1, %0|%0, %1}" > + [(set_attr "type" "ssemov") > + (set_attr "memory" "store") > + (set_attr "prefix" "evex") > + (set_attr "mode" "TI")]) Any pattern that uses a match_dup of an input like this must use "+" not "=" to signal that the operand is in_out not output. And is this really the best description for this insn? It's a store to memory. Surely it's better to say that we store V8QI. > +(define_expand "vec_widen_umult_even_v16si" > + [(set (match_operand:V8DI 0 "register_operand") > + (mult:V8DI > + (zero_extend:V8DI > + (vec_select:V8SI > + (match_operand:V16SI 1 "nonimmediate_operand") > + (parallel [(const_int 0) (const_int 2) > + (const_int 4) (const_int 6) > + (const_int 8) (const_int 10) > + (const_int 12) (const_int 14)]))) > + (zero_extend:V8DI > + (vec_select:V8SI > + (match_operand:V16SI 2 "nonimmediate_operand") > + (parallel [(const_int 0) (const_int 2) > + (const_int 4) (const_int 6) > + (const_int 8) (const_int 10) > + (const_int 12) (const_int 14)])))))] > + "TARGET_AVX512F" > + "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);") We don't need to use ix86_fixup_binary_operands for any three-operand insn. That function is in order to help SSE's two-operand insns. You can drop the define-expand and just keep the define_insn. > +(define_insn "*vec_widen_umult_even_v16si" > + [(set (match_operand:V8DI 0 "register_operand" "=v") > + (mult:V8DI > + (zero_extend:V8DI > + (vec_select:V8SI > + (match_operand:V16SI 1 "nonimmediate_operand" "v") > + (parallel [(const_int 0) (const_int 2) > + (const_int 4) (const_int 6) > + (const_int 8) (const_int 10) > + (const_int 12) (const_int 14)]))) > + (zero_extend:V8DI > + (vec_select:V8SI > + (match_operand:V16SI 2 "nonimmediate_operand" "vm") > + (parallel [(const_int 0) (const_int 2) > + (const_int 4) (const_int 6) > + (const_int 8) (const_int 10) > + (const_int 12) (const_int 14)])))))] > + "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)" You want a "%v" for operand 1, to make the operands commutative. > +(define_insn "*vec_widen_smult_even_v16si" > + [(set (match_operand:V8DI 0 "register_operand" "=x") > + (mult:V8DI Similarly, plus errant "x". r~