On Fri, Nov 1, 2013 at 3:03 AM, Cong Hou <co...@google.com> wrote: > According to your comments, I made the following modifications to this patch: > > 1. Now SAD pattern does not require the first and second operands to > be unsigned. And two versions (signed/unsigned) of the SAD optabs are > defined: usad_optab and ssad_optab. > > 2. Use expand_simple_binop instead of gen_rtx_PLUS to generate the > plus expression in sse.md. Also change the type of the second/third > operands to be nonimmediate_operand. > > 3. Add the document for SAD_EXPR. > > 4. Verify the operands of SAD_EXPR. > > 5. Create a new target: vect_usad_char, and use it in the test case. > > The updated patch is pasted below.
> +(define_expand "usadv16qi" > + [(match_operand:V4SI 0 "register_operand") > + (match_operand:V16QI 1 "register_operand") > + (match_operand:V16QI 2 "nonimmediate_operand") > + (match_operand:V4SI 3 "nonimmediate_operand")] > + "TARGET_SSE2" > +{ > + rtx t1 = gen_reg_rtx (V2DImode); > + rtx t2 = gen_reg_rtx (V4SImode); > + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2])); > + convert_move (t2, t1, 0); > + emit_insn (gen_rtx_SET (VOIDmode, operands[0], > + expand_simple_binop (V4SImode, PLUS, t2, operands[3], > + NULL, 0, OPTAB_DIRECT))); It seems to me that generic expander won't bring any benefit there, operands are already in correct form, so please change the last lines simply to: emit_insn (gen_addv4si3 (operands[0], t2, operands[3])); > + DONE; > +}) > + > +(define_expand "usadv32qi" > + [(match_operand:V8SI 0 "register_operand") > + (match_operand:V32QI 1 "register_operand") > + (match_operand:V32QI 2 "nonimmediate_operand") > + (match_operand:V8SI 3 "nonimmediate_operand")] > + "TARGET_AVX2" > +{ > + rtx t1 = gen_reg_rtx (V4DImode); > + rtx t2 = gen_reg_rtx (V8SImode); > + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2])); > + convert_move (t2, t1, 0); > + emit_insn (gen_rtx_SET (VOIDmode, operands[0], > + expand_simple_binop (V8SImode, PLUS, t2, operands[3], > + NULL, 0, OPTAB_DIRECT))); Same here, using gen_addv8si3. No need to repost the patch with this trivial change. Sorry for the confusion, Uros.