On Wed, Oct 30, 2013 at 4:27 AM, Richard Biener <rguent...@suse.de> wrote: > On Tue, 29 Oct 2013, Cong Hou wrote: > >> Hi >> >> SAD (Sum of Absolute Differences) is a common and important algorithm >> in image processing and other areas. SSE2 even introduced a new >> instruction PSADBW for it. A SAD loop can be greatly accelerated by >> this instruction after being vectorized. This patch introduced a new >> operation SAD_EXPR and a SAD pattern recognizer in vectorizer. >> >> The pattern of SAD is shown below: >> >> unsigned type x_t, y_t; >> signed TYPE1 diff, abs_diff; >> TYPE2 sum = init; >> loop: >> sum_0 = phi <init, sum_1> >> S1 x_t = ... >> S2 y_t = ... >> S3 x_T = (TYPE1) x_t; >> S4 y_T = (TYPE1) y_t; >> S5 diff = x_T - y_T; >> S6 abs_diff = ABS_EXPR <diff>; >> [S7 abs_diff = (TYPE2) abs_diff; #optional] >> S8 sum_1 = abs_diff + sum_0; >> >> where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is >> the >> same size of 'TYPE1' or bigger. This is a special case of a reduction >> computation. >> >> For SSE2, type is char, and TYPE1 and TYPE2 are int. >> >> >> In order to express this new operation, a new expression SAD_EXPR is >> introduced in tree.def, and the corresponding entry in optabs is >> added. The patch also added the "define_expand" for SSE2 and AVX2 >> platforms for i386. >> >> The patch is pasted below and also attached as a text file (in which >> you can see tabs). Bootstrap and make check got passed on x86. Please >> give me your comments. > > Apart from the testcase comment made earlier > > +++ b/gcc/tree-cfg.c > @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt) > return false; > > case DOT_PROD_EXPR: > + case SAD_EXPR: > case REALIGN_LOAD_EXPR: > /* FIXME. */ > return false; > > please add proper verification of the operand types.
OK. > > +/* Widening sad (sum of absolute differences). > + The first two arguments are of type t1 which should be unsigned > integer. > + The third argument and the result are of type t2, such that t2 is at > least > + twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to: > + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2); > + tmp2 = ABS_EXPR (tmp1); > + arg3 = PLUS_EXPR (tmp2, arg3); */ > +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3) > > WIDEN_MINUS_EXPR doesn't exist so you have to explain on its > operation (it returns a signed wide difference?). Why should > the first two arguments be unsigned? I cannot see a good reason > to require that (other than that maybe the x86 target only has > support for widened unsigned difference?). So if you want to > make that restriction maybe change the name to SADU_EXPR > (sum of absolute differences of unsigned)? > > I suppose you tried introducing WIDEN_MINUS_EXPR instead and > letting combine do it's work, avoiding the very special optab? I may use the wrong representation here. I think the behavior of "WIDEN_MINUS_EXPR" in SAD is different from the general one. SAD usually works on unsigned integers (see http://en.wikipedia.org/wiki/Sum_of_absolute_differences), and before getting the difference between two unsigned integers, they are promoted to bigger signed integers. And the result of (int)(char)(1) - (int)(char)(-1) is different from (int)(unsigned char)(1) - (int)(unsigned char)(-1). So we cannot implement SAD using WIDEN_MINUS_EXPR. Also, the SSE2 instruction PSADBW also requires the operands to be unsigned 8-bit integers. I will remove the improper description as you pointed out. thanks, Cong > > Thanks, > Richard. > >> >> >> thanks, >> Cong >> >> >> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >> index 8a38316..d528307 100644 >> --- a/gcc/ChangeLog >> +++ b/gcc/ChangeLog >> @@ -1,3 +1,23 @@ >> +2013-10-29 Cong Hou <co...@google.com> >> + >> + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD >> + pattern recognition. >> + (type_conversion_p): PROMOTION is true if it's a type promotion >> + conversion, and false otherwise. Return true if the given expression >> + is a type conversion one. >> + * tree-vectorizer.h: Adjust the number of patterns. >> + * tree.def: Add SAD_EXPR. >> + * optabs.def: Add sad_optab. >> + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case. >> + * expr.c (expand_expr_real_2): Likewise. >> + * gimple-pretty-print.c (dump_ternary_rhs): Likewise. >> + * gimple.c (get_gimple_rhs_num_ops): Likewise. >> + * optabs.c (optab_for_tree_code): Likewise. >> + * tree-cfg.c (estimate_operator_cost): Likewise. >> + * tree-ssa-operands.c (get_expr_operands): Likewise. >> + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise. >> + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD. >> + >> 2013-10-14 David Malcolm <dmalc...@redhat.com> >> >> * dumpfile.h (gcc::dump_manager): New class, to hold state >> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c >> index 7ed29f5..9ec761a 100644 >> --- a/gcc/cfgexpand.c >> +++ b/gcc/cfgexpand.c >> @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp) >> { >> case COND_EXPR: >> case DOT_PROD_EXPR: >> + case SAD_EXPR: >> case WIDEN_MULT_PLUS_EXPR: >> case WIDEN_MULT_MINUS_EXPR: >> case FMA_EXPR: >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >> index c3f6c94..ca1ab70 100644 >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -6052,6 +6052,40 @@ >> DONE; >> }) >> >> +(define_expand "sadv16qi" >> + [(match_operand:V4SI 0 "register_operand") >> + (match_operand:V16QI 1 "register_operand") >> + (match_operand:V16QI 2 "register_operand") >> + (match_operand:V4SI 3 "register_operand")] >> + "TARGET_SSE2" >> +{ >> + rtx t1 = gen_reg_rtx (V2DImode); >> + rtx t2 = gen_reg_rtx (V4SImode); >> + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2])); >> + convert_move (t2, t1, 0); >> + emit_insn (gen_rtx_SET (VOIDmode, operands[0], >> + gen_rtx_PLUS (V4SImode, >> + operands[3], t2))); >> + DONE; >> +}) >> + >> +(define_expand "sadv32qi" >> + [(match_operand:V8SI 0 "register_operand") >> + (match_operand:V32QI 1 "register_operand") >> + (match_operand:V32QI 2 "register_operand") >> + (match_operand:V8SI 3 "register_operand")] >> + "TARGET_AVX2" >> +{ >> + rtx t1 = gen_reg_rtx (V4DImode); >> + rtx t2 = gen_reg_rtx (V8SImode); >> + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2])); >> + convert_move (t2, t1, 0); >> + emit_insn (gen_rtx_SET (VOIDmode, operands[0], >> + gen_rtx_PLUS (V8SImode, >> + operands[3], t2))); >> + DONE; >> +}) >> + >> (define_insn "ashr<mode>3" >> [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x") >> (ashiftrt:VI24_AVX2 >> diff --git a/gcc/expr.c b/gcc/expr.c >> index 4975a64..1db8a49 100644 >> --- a/gcc/expr.c >> +++ b/gcc/expr.c >> @@ -9026,6 +9026,20 @@ expand_expr_real_2 (sepops ops, rtx target, >> enum machine_mode tmode, >> return target; >> } >> >> + case SAD_EXPR: >> + { >> + tree oprnd0 = treeop0; >> + tree oprnd1 = treeop1; >> + tree oprnd2 = treeop2; >> + rtx op2; >> + >> + expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL); >> + op2 = expand_normal (oprnd2); >> + target = expand_widen_pattern_expr (ops, op0, op1, op2, >> + target, unsignedp); >> + return target; >> + } >> + >> case REALIGN_LOAD_EXPR: >> { >> tree oprnd0 = treeop0; >> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c >> index f0f8166..514ddd1 100644 >> --- a/gcc/gimple-pretty-print.c >> +++ b/gcc/gimple-pretty-print.c >> @@ -425,6 +425,16 @@ dump_ternary_rhs (pretty_printer *buffer, gimple >> gs, int spc, int flags) >> dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, >> false); >> pp_greater (buffer); >> break; >> + >> + case SAD_EXPR: >> + pp_string (buffer, "SAD_EXPR <"); >> + dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, >> false); >> + pp_string (buffer, ", "); >> + dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, >> false); >> + pp_string (buffer, ", "); >> + dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, >> false); >> + pp_greater (buffer); >> + break; >> >> case VEC_PERM_EXPR: >> pp_string (buffer, "VEC_PERM_EXPR <"); >> diff --git a/gcc/gimple.c b/gcc/gimple.c >> index a12dd67..4975959 100644 >> --- a/gcc/gimple.c >> +++ b/gcc/gimple.c >> @@ -2562,6 +2562,7 @@ get_gimple_rhs_num_ops (enum tree_code code) >> || (SYM) == WIDEN_MULT_PLUS_EXPR \ >> || (SYM) == WIDEN_MULT_MINUS_EXPR \ >> || (SYM) == DOT_PROD_EXPR \ >> + || (SYM) == SAD_EXPR \ >> || (SYM) == REALIGN_LOAD_EXPR \ >> || (SYM) == VEC_COND_EXPR \ >> || (SYM) == VEC_PERM_EXPR >> \ >> diff --git a/gcc/optabs.c b/gcc/optabs.c >> index 06a626c..4ddd4d9 100644 >> --- a/gcc/optabs.c >> +++ b/gcc/optabs.c >> @@ -462,6 +462,9 @@ optab_for_tree_code (enum tree_code code, const_tree >> type, >> case DOT_PROD_EXPR: >> return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab; >> >> + case SAD_EXPR: >> + return sad_optab; >> + >> case WIDEN_MULT_PLUS_EXPR: >> return (TYPE_UNSIGNED (type) >> ? (TYPE_SATURATING (type) >> diff --git a/gcc/optabs.def b/gcc/optabs.def >> index 6b924ac..e35d567 100644 >> --- a/gcc/optabs.def >> +++ b/gcc/optabs.def >> @@ -248,6 +248,7 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") >> OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") >> OPTAB_D (udot_prod_optab, "udot_prod$I$a") >> OPTAB_D (usum_widen_optab, "widen_usum$I$a3") >> +OPTAB_D (sad_optab, "sad$I$a") >> OPTAB_D (vec_extract_optab, "vec_extract$a") >> OPTAB_D (vec_init_optab, "vec_init$a") >> OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") >> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >> index 075d071..226b8d5 100644 >> --- a/gcc/testsuite/ChangeLog >> +++ b/gcc/testsuite/ChangeLog >> @@ -1,3 +1,7 @@ >> +2013-10-29 Cong Hou <co...@google.com> >> + >> + * gcc.dg/vect/vect-reduc-sad.c: New. >> + >> 2013-10-14 Tobias Burnus <bur...@net-b.de> >> >> PR fortran/58658 >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >> new file mode 100644 >> index 0000000..14ebb3b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >> @@ -0,0 +1,54 @@ >> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } >> */ >> + >> +#include <stdarg.h> >> +#include "tree-vect.h" >> + >> +#define N 64 >> +#define SAD N*N/2 >> + >> +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >> +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >> + >> +/* Sum of absolute differences between arrays of unsigned char types. >> + Detected as a sad pattern. >> + Vectorized on targets that support sad for unsigned chars. */ >> + >> +__attribute__ ((noinline)) int >> +foo (int len) >> +{ >> + int i; >> + int result = 0; >> + >> + for (i = 0; i < len; i++) >> + result += abs (X[i] - Y[i]); >> + >> + return result; >> +} >> + >> + >> +int >> +main (void) >> +{ >> + int i; >> + int sad; >> + >> + check_vect (); >> + >> + for (i = 0; i < N; i++) >> + { >> + X[i] = i; >> + Y[i] = N - i; >> + __asm__ volatile (""); >> + } >> + >> + sad = foo (N); >> + if (sad != SAD) >> + abort (); >> + >> + return 0; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern: >> detected" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ >> +/* { dg-final { cleanup-tree-dump "vect" } } */ >> + >> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c >> index 8b66791..d689cac 100644 >> --- a/gcc/tree-cfg.c >> +++ b/gcc/tree-cfg.c >> @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt) >> return false; >> >> case DOT_PROD_EXPR: >> + case SAD_EXPR: >> case REALIGN_LOAD_EXPR: >> /* FIXME. */ >> return false; >> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c >> index 2221b9c..44261a3 100644 >> --- a/gcc/tree-inline.c >> +++ b/gcc/tree-inline.c >> @@ -3601,6 +3601,7 @@ estimate_operator_cost (enum tree_code code, >> eni_weights *weights, >> case WIDEN_SUM_EXPR: >> case WIDEN_MULT_EXPR: >> case DOT_PROD_EXPR: >> + case SAD_EXPR: >> case WIDEN_MULT_PLUS_EXPR: >> case WIDEN_MULT_MINUS_EXPR: >> case WIDEN_LSHIFT_EXPR: >> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c >> index 603f797..393efc3 100644 >> --- a/gcc/tree-ssa-operands.c >> +++ b/gcc/tree-ssa-operands.c >> @@ -854,6 +854,7 @@ get_expr_operands (gimple stmt, tree *expr_p, int flags) >> } >> >> case DOT_PROD_EXPR: >> + case SAD_EXPR: >> case REALIGN_LOAD_EXPR: >> case WIDEN_MULT_PLUS_EXPR: >> case WIDEN_MULT_MINUS_EXPR: >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index 638b981..89aa8c7 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -3607,6 +3607,7 @@ get_initial_def_for_reduction (gimple stmt, tree >> init_val, >> { >> case WIDEN_SUM_EXPR: >> case DOT_PROD_EXPR: >> + case SAD_EXPR: >> case PLUS_EXPR: >> case MINUS_EXPR: >> case BIT_IOR_EXPR: >> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >> index 0a4e812..7919449 100644 >> --- a/gcc/tree-vect-patterns.c >> +++ b/gcc/tree-vect-patterns.c >> @@ -45,6 +45,8 @@ static gimple vect_recog_widen_mult_pattern >> (vec<gimple> *, tree *, >> tree *); >> static gimple vect_recog_dot_prod_pattern (vec<gimple> *, tree *, >> tree *); >> +static gimple vect_recog_sad_pattern (vec<gimple> *, tree *, >> + tree *); >> static gimple vect_recog_pow_pattern (vec<gimple> *, tree *, tree *); >> static gimple vect_recog_over_widening_pattern (vec<gimple> *, tree *, >> tree *); >> @@ -62,6 +64,7 @@ static vect_recog_func_ptr >> vect_vect_recog_func_ptrs[NUM_PATTERNS] = { >> vect_recog_widen_mult_pattern, >> vect_recog_widen_sum_pattern, >> vect_recog_dot_prod_pattern, >> + vect_recog_sad_pattern, >> vect_recog_pow_pattern, >> vect_recog_widen_shift_pattern, >> vect_recog_over_widening_pattern, >> @@ -140,9 +143,8 @@ vect_single_imm_use (gimple def_stmt) >> } >> >> /* Check whether NAME, an ssa-name used in USE_STMT, >> - is a result of a type promotion or demotion, such that: >> + is a result of a type promotion, such that: >> DEF_STMT: NAME = NOP (name0) >> - where the type of name0 (ORIG_TYPE) is smaller/bigger than the type of >> NAME. >> If CHECK_SIGN is TRUE, check that either both types are signed or both >> are >> unsigned. */ >> >> @@ -189,10 +191,8 @@ type_conversion_p (tree name, gimple use_stmt, >> bool check_sign, >> >> if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2)) >> *promotion = true; >> - else if (TYPE_PRECISION (*orig_type) >= (TYPE_PRECISION (type) * 2)) >> - *promotion = false; >> else >> - return false; >> + *promotion = false; >> >> if (!vect_is_simple_use (oprnd0, *def_stmt, loop_vinfo, >> bb_vinfo, &dummy_gimple, &dummy, &dt)) >> @@ -433,6 +433,242 @@ vect_recog_dot_prod_pattern (vec<gimple> *stmts, >> tree *type_in, >> } >> >> >> +/* Function vect_recog_sad_pattern >> + >> + Try to find the following Sum of Absolute Difference (SAD) pattern: >> + >> + unsigned type x_t, y_t; >> + signed TYPE1 diff, abs_diff; >> + TYPE2 sum = init; >> + loop: >> + sum_0 = phi <init, sum_1> >> + S1 x_t = ... >> + S2 y_t = ... >> + S3 x_T = (TYPE1) x_t; >> + S4 y_T = (TYPE1) y_t; >> + S5 diff = x_T - y_T; >> + S6 abs_diff = ABS_EXPR <diff>; >> + [S7 abs_diff = (TYPE2) abs_diff; #optional] >> + S8 sum_1 = abs_diff + sum_0; >> + >> + where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is >> the >> + same size of 'TYPE1' or bigger. This is a special case of a reduction >> + computation. >> + >> + Input: >> + >> + * STMTS: Contains a stmt from which the pattern search begins. In the >> + example, when this function is called with S8, the pattern >> + {S3,S4,S5,S6,S7,S8} will be detected. >> + >> + Output: >> + >> + * TYPE_IN: The type of the input arguments to the pattern. >> + >> + * TYPE_OUT: The type of the output of this pattern. >> + >> + * Return value: A new stmt that will be used to replace the sequence of >> + stmts that constitute the pattern. In this case it will be: >> + SAD_EXPR <x_t, y_t, sum_0> >> + */ >> + >> +static gimple >> +vect_recog_sad_pattern (vec<gimple> *stmts, tree *type_in, >> + tree *type_out) >> +{ >> + gimple last_stmt = (*stmts)[0]; >> + tree sad_oprnd0, sad_oprnd1; >> + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); >> + tree half_type; >> + loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo); >> + struct loop *loop; >> + bool promotion; >> + >> + if (!loop_info) >> + return NULL; >> + >> + loop = LOOP_VINFO_LOOP (loop_info); >> + >> + if (!is_gimple_assign (last_stmt)) >> + return NULL; >> + >> + tree sum_type = gimple_expr_type (last_stmt); >> + >> + /* Look for the following pattern >> + DX = (TYPE1) X; >> + DY = (TYPE1) Y; >> + DDIFF = DX - DY; >> + DAD = ABS_EXPR <DDIFF>; >> + DDPROD = (TYPE2) DPROD; >> + sum_1 = DAD + sum_0; >> + In which >> + - DX is at least double the size of X >> + - DY is at least double the size of Y >> + - DX, DY, DDIFF, DAD all have the same type >> + - sum is the same size of DAD or bigger >> + - sum has been recognized as a reduction variable. >> + >> + This is equivalent to: >> + DDIFF = X w- Y; #widen sub >> + DAD = ABS_EXPR <DDIFF>; >> + sum_1 = DAD w+ sum_0; #widen summation >> + or >> + DDIFF = X w- Y; #widen sub >> + DAD = ABS_EXPR <DDIFF>; >> + sum_1 = DAD + sum_0; #summation >> + */ >> + >> + /* Starting from LAST_STMT, follow the defs of its uses in search >> + of the above pattern. */ >> + >> + if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) >> + return NULL; >> + >> + tree plus_oprnd0, plus_oprnd1; >> + >> + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) >> + { >> + /* Has been detected as widening-summation? */ >> + >> + gimple stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo); >> + sum_type = gimple_expr_type (stmt); >> + if (gimple_assign_rhs_code (stmt) != WIDEN_SUM_EXPR) >> + return NULL; >> + plus_oprnd0 = gimple_assign_rhs1 (stmt); >> + plus_oprnd1 = gimple_assign_rhs2 (stmt); >> + half_type = TREE_TYPE (plus_oprnd0); >> + } >> + else >> + { >> + gimple def_stmt; >> + >> + if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def) >> + return NULL; >> + plus_oprnd0 = gimple_assign_rhs1 (last_stmt); >> + plus_oprnd1 = gimple_assign_rhs2 (last_stmt); >> + if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type) >> + || !types_compatible_p (TREE_TYPE (plus_oprnd1), sum_type)) >> + return NULL; >> + >> + /* The type conversion could be promotion, demotion, >> + or just signed -> unsigned. */ >> + if (type_conversion_p (plus_oprnd0, last_stmt, false, >> + &half_type, &def_stmt, &promotion)) >> + plus_oprnd0 = gimple_assign_rhs1 (def_stmt); >> + else >> + half_type = sum_type; >> + } >> + >> + /* So far so good. Since last_stmt was detected as a (summation) >> reduction, >> + we know that plus_oprnd1 is the reduction variable (defined by a >> loop-header >> + phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop >> body. >> + Then check that plus_oprnd0 is defined by an abs_expr */ >> + >> + if (TREE_CODE (plus_oprnd0) != SSA_NAME) >> + return NULL; >> + >> + tree abs_type = half_type; >> + gimple abs_stmt = SSA_NAME_DEF_STMT (plus_oprnd0); >> + >> + /* It could not be the sad pattern if the abs_stmt is outside the loop. >> */ >> + if (!gimple_bb (abs_stmt) || !flow_bb_inside_loop_p (loop, >> gimple_bb (abs_stmt))) >> + return NULL; >> + >> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a >> phi >> + inside the loop (in case we are analyzing an outer-loop). */ >> + if (!is_gimple_assign (abs_stmt)) >> + return NULL; >> + >> + stmt_vec_info abs_stmt_vinfo = vinfo_for_stmt (abs_stmt); >> + gcc_assert (abs_stmt_vinfo); >> + if (STMT_VINFO_DEF_TYPE (abs_stmt_vinfo) != vect_internal_def) >> + return NULL; >> + if (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR) >> + return NULL; >> + >> + tree abs_oprnd = gimple_assign_rhs1 (abs_stmt); >> + if (!types_compatible_p (TREE_TYPE (abs_oprnd), abs_type)) >> + return NULL; >> + if (TYPE_UNSIGNED (abs_type)) >> + return NULL; >> + >> + /* We then detect if the operand of abs_expr is defined by a minus_expr. >> */ >> + >> + if (TREE_CODE (abs_oprnd) != SSA_NAME) >> + return NULL; >> + >> + gimple diff_stmt = SSA_NAME_DEF_STMT (abs_oprnd); >> + >> + /* It could not be the sad pattern if the diff_stmt is outside the loop. >> */ >> + if (!gimple_bb (diff_stmt) >> + || !flow_bb_inside_loop_p (loop, gimple_bb (diff_stmt))) >> + return NULL; >> + >> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a >> phi >> + inside the loop (in case we are analyzing an outer-loop). */ >> + if (!is_gimple_assign (diff_stmt)) >> + return NULL; >> + >> + stmt_vec_info diff_stmt_vinfo = vinfo_for_stmt (diff_stmt); >> + gcc_assert (diff_stmt_vinfo); >> + if (STMT_VINFO_DEF_TYPE (diff_stmt_vinfo) != vect_internal_def) >> + return NULL; >> + if (gimple_assign_rhs_code (diff_stmt) != MINUS_EXPR) >> + return NULL; >> + >> + tree half_type0, half_type1; >> + gimple def_stmt; >> + >> + tree minus_oprnd0 = gimple_assign_rhs1 (diff_stmt); >> + tree minus_oprnd1 = gimple_assign_rhs2 (diff_stmt); >> + >> + if (!types_compatible_p (TREE_TYPE (minus_oprnd0), abs_type) >> + || !types_compatible_p (TREE_TYPE (minus_oprnd1), abs_type)) >> + return NULL; >> + if (!type_conversion_p (minus_oprnd0, diff_stmt, false, >> + &half_type0, &def_stmt, &promotion) >> + || !promotion) >> + return NULL; >> + sad_oprnd0 = gimple_assign_rhs1 (def_stmt); >> + >> + if (!type_conversion_p (minus_oprnd1, diff_stmt, false, >> + &half_type1, &def_stmt, &promotion) >> + || !promotion) >> + return NULL; >> + sad_oprnd1 = gimple_assign_rhs1 (def_stmt); >> + >> + if (!types_compatible_p (half_type0, half_type1)) >> + return NULL; >> + if (!TYPE_UNSIGNED (half_type0)) >> + return NULL; >> + if (TYPE_PRECISION (abs_type) < TYPE_PRECISION (half_type0) * 2 >> + || TYPE_PRECISION (sum_type) < TYPE_PRECISION (half_type0) * 2) >> + return NULL; >> + >> + *type_in = TREE_TYPE (sad_oprnd0); >> + *type_out = sum_type; >> + >> + /* Pattern detected. Create a stmt to be used to replace the pattern: */ >> + tree var = vect_recog_temp_ssa_var (sum_type, NULL); >> + gimple pattern_stmt = gimple_build_assign_with_ops >> + (SAD_EXPR, var, sad_oprnd0, sad_oprnd1, >> plus_oprnd1); >> + >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_recog_sad_pattern: detected: "); >> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); >> + dump_printf (MSG_NOTE, "\n"); >> + } >> + >> + /* We don't allow changing the order of the computation in the inner-loop >> + when doing outer-loop vectorization. */ >> + gcc_assert (!nested_in_vect_loop_p (loop, last_stmt)); >> + >> + return pattern_stmt; >> +} >> + >> + >> /* Handle widening operation by a constant. At the moment we support >> MULT_EXPR >> and LSHIFT_EXPR. >> >> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h >> index 8b7b345..0aac75b 100644 >> --- a/gcc/tree-vectorizer.h >> +++ b/gcc/tree-vectorizer.h >> @@ -1044,7 +1044,7 @@ extern void vect_slp_transform_bb (basic_block); >> Additional pattern recognition functions can (and will) be added >> in the future. */ >> typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *); >> -#define NUM_PATTERNS 11 >> +#define NUM_PATTERNS 12 >> void vect_pattern_recog (loop_vec_info, bb_vec_info); >> >> /* In tree-vectorizer.c. */ >> diff --git a/gcc/tree.def b/gcc/tree.def >> index 88c850a..31a3b64 100644 >> --- a/gcc/tree.def >> +++ b/gcc/tree.def >> @@ -1146,6 +1146,15 @@ DEFTREECODE (REDUC_PLUS_EXPR, >> "reduc_plus_expr", tcc_unary, 1) >> arg3 = WIDEN_SUM_EXPR (tmp, arg3); */ >> DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) >> >> +/* Widening sad (sum of absolute differences). >> + The first two arguments are of type t1 which should be unsigned integer. >> + The third argument and the result are of type t2, such that t2 is at >> least >> + twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to: >> + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2); >> + tmp2 = ABS_EXPR (tmp1); >> + arg3 = PLUS_EXPR (tmp2, arg3); */ >> +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3) >> + >> /* Widening summation. >> The first argument is of type t1. >> The second argument is of type t2, such that t2 is at least twice >> > > -- > Richard Biener <rguent...@suse.de> > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imend