On Thu, Oct 31, 2013 at 12:29 AM, Cong Hou <co...@google.com> wrote: > On Tue, Oct 29, 2013 at 4:49 PM, Ramana Radhakrishnan > <ramana....@googlemail.com> wrote: >> Cong, >> >> Please don't do the following. >> >>>+++ b/gcc/testsuite/gcc.dg/vect/ >> vect-reduc-sad.c >> @@ -0,0 +1,54 @@ >> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } >> */ >> >> you are adding a test to gcc.dg/vect - It's a common directory >> containing tests that need to run on multiple architectures and such >> tests should be keyed by the feature they enable which can be turned >> on for ports that have such an instruction. >> >> The correct way of doing this is to key this on the feature something >> like dg-require-effective-target vect_sad_char . And define the >> equivalent routine in testsuite/lib/target-supports.exp and enable it >> for sse2 for the x86 port. If in doubt look at >> check_effective_target_vect_int and a whole family of such functions >> in testsuite/lib/target-supports.exp >> >> This makes life easy for other port maintainers who want to turn on >> this support. And for bonus points please update the testcase writing >> wiki page with this information if it isn't already there. >> > > OK, I will likely move the test case to gcc.target/i386 as currently > only SSE2 provides SAD instruction. But your suggestion also helps!
Sorry, no - I really don't like that approach, if the test remains in the common directory keyed off as I suggested, it makes life easier when turning this on in other ports as adding this pattern in the port would take this test from being UNSUPPORTED->XPASS and keeps gcc.dg/vect reasonably up to date with respect to testing the features of the vectorizer and in touch with the way in which the tests in gcc.dg/vect have been written till date. I think Neon has an equivalent instruction called vaba but I will have to check in the morning when I get back to my machine. regards Ramana > >6 abs_diff = ABS_EXPR <diff>; >>> [S7 abs_diff = (TYPE2) abs_diff; #optional] >>> S8 sum_1 = abs_diff + sum_0; >>> >>> where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is >>> the >>> same size of 'TYPE1' or bigger. This is a special case of a reduction >>> computation. >>> >>> For SSE2, type is char, and TYPE1 and TYPE2 are int. >>> >>> >>> In order to express this new operation, a new expression SAD_EXPR is >>> introduced in tree.def, and the corresponding entry in optabs is >>> added. The patch also added the "define_expand" for SSE2 and AVX2 >>> platforms for i386. >>> >>> The patch is pasted below and also attached as a text file (in which >>> you can see tabs). Bootstrap and make check got passed on x86. Please >>> give me your comments. >>> >>> >>> >>> thanks, >>> Cong >>> >>> >>> >>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >>> index 8a38316..d528307 100644 >>> --- a/gcc/ChangeLog >>> +++ b/gcc/ChangeLog >>> @@ -1,3 +1,23 @@ >>> +2013-10-29 Cong Hou <co...@google.com> >>> + >>> + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD >>> + pattern recognition. >>> + (type_conversion_p): PROMOTION is true if it's a type promotion >>> + conversion, and false otherwise. Return true if the given expression >>> + is a type conversion one. >>> + * tree-vectorizer.h: Adjust the number of patterns. >>> + * tree.def: Add SAD_EXPR. >>> + * optabs.def: Add sad_optab. >>> + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case. >>> + * expr.c (expand_expr_real_2): Likewise. >>> + * gimple-pretty-print.c (dump_ternary_rhs): Likewise. >>> + * gimple.c (get_gimple_rhs_num_ops): Likewise. >>> + * optabs.c (optab_for_tree_code): Likewise. >>> + * tree-cfg.c (estimate_operator_cost): Likewise. >>> + * tree-ssa-operands.c (get_expr_operands): Likewise. >>> + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise. >>> + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD. >>> + >>> 2013-10-14 David Malcolm <dmalc...@redhat.com> >>> >>> * dumpfile.h (gcc::dump_manager): New class, to hold state >>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c >>> index 7ed29f5..9ec761a 100644 >>> --- a/gcc/cfgexpand.c >>> +++ b/gcc/cfgexpand.c >>> @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp) >>> { >>> case COND_EXPR: >>> case DOT_PROD_EXPR: >>> + case SAD_EXPR: >>> case WIDEN_MULT_PLUS_EXPR: >>> case WIDEN_MULT_MINUS_EXPR: >>> case FMA_EXPR: >>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >>> index c3f6c94..ca1ab70 100644 >>> --- a/gcc/config/i386/sse.md >>> +++ b/gcc/config/i386/sse.md >>> @@ -6052,6 +6052,40 @@ >>> DONE; >>> }) >>> >>> +(define_expand "sadv16qi" >>> + [(match_operand:V4SI 0 "register_operand") >>> + (match_operand:V16QI 1 "register_operand") >>> + (match_operand:V16QI 2 "register_operand") >>> + (match_operand:V4SI 3 "register_operand")] >>> + "TARGET_SSE2" >>> +{ >>> + rtx t1 = gen_reg_rtx (V2DImode); >>> + rtx t2 = gen_reg_rtx (V4SImode); >>> + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2])); >>> + convert_move (t2, t1, 0); >>> + emit_insn (gen_rtx_SET (VOIDmode, operands[0], >>> + gen_rtx_PLUS (V4SImode, >>> + operands[3], t2))); >>> + DONE; >>> +}) >>> + >>> +(define_expand "sadv32qi" >>> + [(match_operand:V8SI 0 "register_operand") >>> + (match_operand:V32QI 1 "register_operand") >>> + (match_operand:V32QI 2 "register_operand") >>> + (match_operand:V8SI 3 "register_operand")] >>> + "TARGET_AVX2" >>> +{ >>> + rtx t1 = gen_reg_rtx (V4DImode); >>> + rtx t2 = gen_reg_rtx (V8SImode); >>> + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2])); >>> + convert_move (t2, t1, 0); >>> + emit_insn (gen_rtx_SET (VOIDmode, operands[0], >>> + gen_rtx_PLUS (V8SImode, >>> + operands[3], t2))); >>> + DONE; >>> +}) >>> + >>> (define_insn "ashr<mode>3" >>> [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x") >>> (ashiftrt:VI24_AVX2 >>> diff --git a/gcc/expr.c b/gcc/expr.c >>> index 4975a64..1db8a49 100644 >>> --- a/gcc/expr.c >>> +++ b/gcc/expr.c >>> @@ -9026,6 +9026,20 @@ expand_expr_real_2 (sepops ops, rtx target, >>> enum machine_mode tmode, >>> return target; >>> } >>> >>> + case SAD_EXPR: >>> + { >>> + tree oprnd0 = treeop0; >>> + tree oprnd1 = treeop1; >>> + tree oprnd2 = treeop2; >>> + rtx op2; >>> + >>> + expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL); >>> + op2 = expand_normal (oprnd2); >>> + target = expand_widen_pattern_expr (ops, op0, op1, op2, >>> + target, unsignedp); >>> + return target; >>> + } >>> + >>> case REALIGN_LOAD_EXPR: >>> { >>> tree oprnd0 = treeop0; >>> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c >>> index f0f8166..514ddd1 100644 >>> --- a/gcc/gimple-pretty-print.c >>> +++ b/gcc/gimple-pretty-print.c >>> @@ -425,6 +425,16 @@ dump_ternary_rhs (pretty_printer *buffer, gimple >>> gs, int spc, int flags) >>> dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, >>> false); >>> pp_greater (buffer); >>> break; >>> + >>> + case SAD_EXPR: >>> + pp_string (buffer, "SAD_EXPR <"); >>> + dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, >>> false); >>> + pp_string (buffer, ", "); >>> + dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, >>> false); >>> + pp_string (buffer, ", "); >>> + dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, >>> false); >>> + pp_greater (buffer); >>> + break; >>> >>> case VEC_PERM_EXPR: >>> pp_string (buffer, "VEC_PERM_EXPR <"); >>> diff --git a/gcc/gimple.c b/gcc/gimple.c >>> index a12dd67..4975959 100644 >>> --- a/gcc/gimple.c >>> +++ b/gcc/gimple.c >>> @@ -2562,6 +2562,7 @@ get_gimple_rhs_num_ops (enum tree_code code) >>> || (SYM) == WIDEN_MULT_PLUS_EXPR \ >>> || (SYM) == WIDEN_MULT_MINUS_EXPR \ >>> || (SYM) == DOT_PROD_EXPR \ >>> + || (SYM) == SAD_EXPR \ >>> || (SYM) == REALIGN_LOAD_EXPR \ >>> || (SYM) == VEC_COND_EXPR \ >>> || (SYM) == VEC_PERM_EXPR >>> \ >>> diff --git a/gcc/optabs.c b/gcc/optabs.c >>> index 06a626c..4ddd4d9 100644 >>> --- a/gcc/optabs.c >>> +++ b/gcc/optabs.c >>> @@ -462,6 +462,9 @@ optab_for_tree_code (enum tree_code code, const_tree >>> type, >>> case DOT_PROD_EXPR: >>> return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab; >>> >>> + case SAD_EXPR: >>> + return sad_optab; >>> + >>> case WIDEN_MULT_PLUS_EXPR: >>> return (TYPE_UNSIGNED (type) >>> ? (TYPE_SATURATING (type) >>> diff --git a/gcc/optabs.def b/gcc/optabs.def >>> index 6b924ac..e35d567 100644 >>> --- a/gcc/optabs.def >>> +++ b/gcc/optabs.def >>> @@ -248,6 +248,7 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") >>> OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") >>> OPTAB_D (udot_prod_optab, "udot_prod$I$a") >>> OPTAB_D (usum_widen_optab, "widen_usum$I$a3") >>> +OPTAB_D (sad_optab, "sad$I$a") >>> OPTAB_D (vec_extract_optab, "vec_extract$a") >>> OPTAB_D (vec_init_optab, "vec_init$a") >>> OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") >>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >>> index 075d071..226b8d5 100644 >>> --- a/gcc/testsuite/ChangeLog >>> +++ b/gcc/testsuite/ChangeLog >>> @@ -1,3 +1,7 @@ >>> +2013-10-29 Cong Hou <co...@google.com> >>> + >>> + * gcc.dg/vect/vect-reduc-sad.c: New. >>> + >>> 2013-10-14 Tobias Burnus <bur...@net-b.de> >>> >>> PR fortran/58658 >>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >>> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >>> new file mode 100644 >>> index 0000000..14ebb3b >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c >>> @@ -0,0 +1,54 @@ >>> +/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } >>> */ >>> + >>> +#include <stdarg.h> >>> +#include "tree-vect.h" >>> + >>> +#define N 64 >>> +#define SAD N*N/2 >>> + >>> +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >>> +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >>> + >>> +/* Sum of absolute differences between arrays of unsigned char types. >>> + Detected as a sad pattern. >>> + Vectorized on targets that support sad for unsigned chars. */ >>> + >>> +__attribute__ ((noinline)) int >>> +foo (int len) >>> +{ >>> + int i; >>> + int result = 0; >>> + >>> + for (i = 0; i < len; i++) >>> + result += abs (X[i] - Y[i]); >>> + >>> + return result; >>> +} >>> + >>> + >>> +int >>> +main (void) >>> +{ >>> + int i; >>> + int sad; >>> + >>> + check_vect (); >>> + >>> + for (i = 0; i < N; i++) >>> + { >>> + X[i] = i; >>> + Y[i] = N - i; >>> + __asm__ volatile (""); >>> + } >>> + >>> + sad = foo (N); >>> + if (sad != SAD) >>> + abort (); >>> + >>> + return 0; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern: >>> detected" 1 "vect" } } */ >>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ >>> +/* { dg-final { cleanup-tree-dump "vect" } } */ >>> + >>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c >>> index 8b66791..d689cac 100644 >>> --- a/gcc/tree-cfg.c >>> +++ b/gcc/tree-cfg.c >>> @@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt) >>> return false; >>> >>> case DOT_PROD_EXPR: >>> + case SAD_EXPR: >>> case REALIGN_LOAD_EXPR: >>> /* FIXME. */ >>> return false; >>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c >>> index 2221b9c..44261a3 100644 >>> --- a/gcc/tree-inline.c >>> +++ b/gcc/tree-inline.c >>> @@ -3601,6 +3601,7 @@ estimate_operator_cost (enum tree_code code, >>> eni_weights *weights, >>> case WIDEN_SUM_EXPR: >>> case WIDEN_MULT_EXPR: >>> case DOT_PROD_EXPR: >>> + case SAD_EXPR: >>> case WIDEN_MULT_PLUS_EXPR: >>> case WIDEN_MULT_MINUS_EXPR: >>> case WIDEN_LSHIFT_EXPR: >>> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c >>> index 603f797..393efc3 100644 >>> --- a/gcc/tree-ssa-operands.c >>> +++ b/gcc/tree-ssa-operands.c >>> @@ -854,6 +854,7 @@ get_expr_operands (gimple stmt, tree *expr_p, int flags) >>> } >>> >>> case DOT_PROD_EXPR: >>> + case SAD_EXPR: >>> case REALIGN_LOAD_EXPR: >>> case WIDEN_MULT_PLUS_EXPR: >>> case WIDEN_MULT_MINUS_EXPR: >>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>> index 638b981..89aa8c7 100644 >>> --- a/gcc/tree-vect-loop.c >>> +++ b/gcc/tree-vect-loop.c >>> @@ -3607,6 +3607,7 @@ get_initial_def_for_reduction (gimple stmt, tree >>> init_val, >>> { >>> case WIDEN_SUM_EXPR: >>> case DOT_PROD_EXPR: >>> + case SAD_EXPR: >>> case PLUS_EXPR: >>> case MINUS_EXPR: >>> case BIT_IOR_EXPR: >>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >>> index 0a4e812..7919449 100644 >>> --- a/gcc/tree-vect-patterns.c >>> +++ b/gcc/tree-vect-patterns.c >>> @@ -45,6 +45,8 @@ static gimple vect_recog_widen_mult_pattern >>> (vec<gimple> *, tree *, >>> tree *); >>> static gimple vect_recog_dot_prod_pattern (vec<gimple> *, tree *, >>> tree *); >>> +static gimple vect_recog_sad_pattern (vec<gimple> *, tree *, >>> + tree *); >>> static gimple vect_recog_pow_pattern (vec<gimple> *, tree *, tree *); >>> static gimple vect_recog_over_widening_pattern (vec<gimple> *, tree *, >>> tree *); >>> @@ -62,6 +64,7 @@ static vect_recog_func_ptr >>> vect_vect_recog_func_ptrs[NUM_PATTERNS] = { >>> vect_recog_widen_mult_pattern, >>> vect_recog_widen_sum_pattern, >>> vect_recog_dot_prod_pattern, >>> + vect_recog_sad_pattern, >>> vect_recog_pow_pattern, >>> vect_recog_widen_shift_pattern, >>> vect_recog_over_widening_pattern, >>> @@ -140,9 +143,8 @@ vect_single_imm_use (gimple def_stmt) >>> } >>> >>> /* Check whether NAME, an ssa-name used in USE_STMT, >>> - is a result of a type promotion or demotion, such that: >>> + is a result of a type promotion, such that: >>> DEF_STMT: NAME = NOP (name0) >>> - where the type of name0 (ORIG_TYPE) is smaller/bigger than the type of >>> NAME. >>> If CHECK_SIGN is TRUE, check that either both types are signed or both >>> are >>> unsigned. */ >>> >>> @@ -189,10 +191,8 @@ type_conversion_p (tree name, gimple use_stmt, >>> bool check_sign, >>> >>> if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2)) >>> *promotion = true; >>> - else if (TYPE_PRECISION (*orig_type) >= (TYPE_PRECISION (type) * 2)) >>> - *promotion = false; >>> else >>> - return false; >>> + *promotion = false; >>> >>> if (!vect_is_simple_use (oprnd0, *def_stmt, loop_vinfo, >>> bb_vinfo, &dummy_gimple, &dummy, &dt)) >>> @@ -433,6 +433,242 @@ vect_recog_dot_prod_pattern (vec<gimple> *stmts, >>> tree *type_in, >>> } >>> >>> >>> +/* Function vect_recog_sad_pattern >>> + >>> + Try to find the following Sum of Absolute Difference (SAD) pattern: >>> + >>> + unsigned type x_t, y_t; >>> + signed TYPE1 diff, abs_diff; >>> + TYPE2 sum = init; >>> + loop: >>> + sum_0 = phi <init, sum_1> >>> + S1 x_t = ... >>> + S2 y_t = ... >>> + S3 x_T = (TYPE1) x_t; >>> + S4 y_T = (TYPE1) y_t; >>> + S5 diff = x_T - y_T; >>> + S6 abs_diff = ABS_EXPR <diff>; >>> + [S7 abs_diff = (TYPE2) abs_diff; #optional] >>> + S8 sum_1 = abs_diff + sum_0; >>> + >>> + where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' >>> is the >>> + same size of 'TYPE1' or bigger. This is a special case of a reduction >>> + computation. >>> + >>> + Input: >>> + >>> + * STMTS: Contains a stmt from which the pattern search begins. In the >>> + example, when this function is called with S8, the pattern >>> + {S3,S4,S5,S6,S7,S8} will be detected. >>> + >>> + Output: >>> + >>> + * TYPE_IN: The type of the input arguments to the pattern. >>> + >>> + * TYPE_OUT: The type of the output of this pattern. >>> + >>> + * Return value: A new stmt that will be used to replace the sequence of >>> + stmts that constitute the pattern. In this case it will be: >>> + SAD_EXPR <x_t, y_t, sum_0> >>> + */ >>> + >>> +static gimple >>> +vect_recog_sad_pattern (vec<gimple> *stmts, tree *type_in, >>> + tree *type_out) >>> +{ >>> + gimple last_stmt = (*stmts)[0]; >>> + tree sad_oprnd0, sad_oprnd1; >>> + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); >>> + tree half_type; >>> + loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo); >>> + struct loop *loop; >>> + bool promotion; >>> + >>> + if (!loop_info) >>> + return NULL; >>> + >>> + loop = LOOP_VINFO_LOOP (loop_info); >>> + >>> + if (!is_gimple_assign (last_stmt)) >>> + return NULL; >>> + >>> + tree sum_type = gimple_expr_type (last_stmt); >>> + >>> + /* Look for the following pattern >>> + DX = (TYPE1) X; >>> + DY = (TYPE1) Y; >>> + DDIFF = DX - DY; >>> + DAD = ABS_EXPR <DDIFF>; >>> + DDPROD = (TYPE2) DPROD; >>> + sum_1 = DAD + sum_0; >>> + In which >>> + - DX is at least double the size of X >>> + - DY is at least double the size of Y >>> + - DX, DY, DDIFF, DAD all have the same type >>> + - sum is the same size of DAD or bigger >>> + - sum has been recognized as a reduction variable. >>> + >>> + This is equivalent to: >>> + DDIFF = X w- Y; #widen sub >>> + DAD = ABS_EXPR <DDIFF>; >>> + sum_1 = DAD w+ sum_0; #widen summation >>> + or >>> + DDIFF = X w- Y; #widen sub >>> + DAD = ABS_EXPR <DDIFF>; >>> + sum_1 = DAD + sum_0; #summation >>> + */ >>> + >>> + /* Starting from LAST_STMT, follow the defs of its uses in search >>> + of the above pattern. */ >>> + >>> + if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) >>> + return NULL; >>> + >>> + tree plus_oprnd0, plus_oprnd1; >>> + >>> + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) >>> + { >>> + /* Has been detected as widening-summation? */ >>> + >>> + gimple stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo); >>> + sum_type = gimple_expr_type (stmt); >>> + if (gimple_assign_rhs_code (stmt) != WIDEN_SUM_EXPR) >>> + return NULL; >>> + plus_oprnd0 = gimple_assign_rhs1 (stmt); >>> + plus_oprnd1 = gimple_assign_rhs2 (stmt); >>> + half_type = TREE_TYPE (plus_oprnd0); >>> + } >>> + else >>> + { >>> + gimple def_stmt; >>> + >>> + if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def) >>> + return NULL; >>> + plus_oprnd0 = gimple_assign_rhs1 (last_stmt); >>> + plus_oprnd1 = gimple_assign_rhs2 (last_stmt); >>> + if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type) >>> + || !types_compatible_p (TREE_TYPE (plus_oprnd1), sum_type)) >>> + return NULL; >>> + >>> + /* The type conversion could be promotion, demotion, >>> + or just signed -> unsigned. */ >>> + if (type_conversion_p (plus_oprnd0, last_stmt, false, >>> + &half_type, &def_stmt, &promotion)) >>> + plus_oprnd0 = gimple_assign_rhs1 (def_stmt); >>> + else >>> + half_type = sum_type; >>> + } >>> + >>> + /* So far so good. Since last_stmt was detected as a (summation) >>> reduction, >>> + we know that plus_oprnd1 is the reduction variable (defined by a >>> loop-header >>> + phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop >>> body. >>> + Then check that plus_oprnd0 is defined by an abs_expr */ >>> + >>> + if (TREE_CODE (plus_oprnd0) != SSA_NAME) >>> + return NULL; >>> + >>> + tree abs_type = half_type; >>> + gimple abs_stmt = SSA_NAME_DEF_STMT (plus_oprnd0); >>> + >>> + /* It could not be the sad pattern if the abs_stmt is outside the loop. >>> */ >>> + if (!gimple_bb (abs_stmt) || !flow_bb_inside_loop_p (loop, >>> gimple_bb (abs_stmt))) >>> + return NULL; >>> + >>> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a >>> phi >>> + inside the loop (in case we are analyzing an outer-loop). */ >>> + if (!is_gimple_assign (abs_stmt)) >>> + return NULL; >>> + >>> + stmt_vec_info abs_stmt_vinfo = vinfo_for_stmt (abs_stmt); >>> + gcc_assert (abs_stmt_vinfo); >>> + if (STMT_VINFO_DEF_TYPE (abs_stmt_vinfo) != vect_internal_def) >>> + return NULL; >>> + if (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR) >>> + return NULL; >>> + >>> + tree abs_oprnd = gimple_assign_rhs1 (abs_stmt); >>> + if (!types_compatible_p (TREE_TYPE (abs_oprnd), abs_type)) >>> + return NULL; >>> + if (TYPE_UNSIGNED (abs_type)) >>> + return NULL; >>> + >>> + /* We then detect if the operand of abs_expr is defined by a minus_expr. >>> */ >>> + >>> + if (TREE_CODE (abs_oprnd) != SSA_NAME) >>> + return NULL; >>> + >>> + gimple diff_stmt = SSA_NAME_DEF_STMT (abs_oprnd); >>> + >>> + /* It could not be the sad pattern if the diff_stmt is outside the loop. >>> */ >>> + if (!gimple_bb (diff_stmt) >>> + || !flow_bb_inside_loop_p (loop, gimple_bb (diff_stmt))) >>> + return NULL; >>> + >>> + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a >>> phi >>> + inside the loop (in case we are analyzing an outer-loop). */ >>> + if (!is_gimple_assign (diff_stmt)) >>> + return NULL; >>> + >>> + stmt_vec_info diff_stmt_vinfo = vinfo_for_stmt (diff_stmt); >>> + gcc_assert (diff_stmt_vinfo); >>> + if (STMT_VINFO_DEF_TYPE (diff_stmt_vinfo) != vect_internal_def) >>> + return NULL; >>> + if (gimple_assign_rhs_code (diff_stmt) != MINUS_EXPR) >>> + return NULL; >>> + >>> + tree half_type0, half_type1; >>> + gimple def_stmt; >>> + >>> + tree minus_oprnd0 = gimple_assign_rhs1 (diff_stmt); >>> + tree minus_oprnd1 = gimple_assign_rhs2 (diff_stmt); >>> + >>> + if (!types_compatible_p (TREE_TYPE (minus_oprnd0), abs_type) >>> + || !types_compatible_p (TREE_TYPE (minus_oprnd1), abs_type)) >>> + return NULL; >>> + if (!type_conversion_p (minus_oprnd0, diff_stmt, false, >>> + &half_type0, &def_stmt, &promotion) >>> + || !promotion) >>> + return NULL; >>> + sad_oprnd0 = gimple_assign_rhs1 (def_stmt); >>> + >>> + if (!type_conversion_p (minus_oprnd1, diff_stmt, false, >>> + &half_type1, &def_stmt, &promotion) >>> + || !promotion) >>> + return NULL; >>> + sad_oprnd1 = gimple_assign_rhs1 (def_stmt); >>> + >>> + if (!types_compatible_p (half_type0, half_type1)) >>> + return NULL; >>> + if (!TYPE_UNSIGNED (half_type0)) >>> + return NULL; >>> + if (TYPE_PRECISION (abs_type) < TYPE_PRECISION (half_type0) * 2 >>> + || TYPE_PRECISION (sum_type) < TYPE_PRECISION (half_type0) * 2) >>> + return NULL; >>> + >>> + *type_in = TREE_TYPE (sad_oprnd0); >>> + *type_out = sum_type; >>> + >>> + /* Pattern detected. Create a stmt to be used to replace the pattern: */ >>> + tree var = vect_recog_temp_ssa_var (sum_type, NULL); >>> + gimple pattern_stmt = gimple_build_assign_with_ops >>> + (SAD_EXPR, var, sad_oprnd0, sad_oprnd1, >>> plus_oprnd1); >>> + >>> + if (dump_enabled_p ()) >>> + { >>> + dump_printf_loc (MSG_NOTE, vect_location, >>> + "vect_recog_sad_pattern: detected: "); >>> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); >>> + dump_printf (MSG_NOTE, "\n"); >>> + } >>> + >>> + /* We don't allow changing the order of the computation in the inner-loop >>> + when doing outer-loop vectorization. */ >>> + gcc_assert (!nested_in_vect_loop_p (loop, last_stmt)); >>> + >>> + return pattern_stmt; >>> +} >>> + >>> + >>> /* Handle widening operation by a constant. At the moment we support >>> MULT_EXPR >>> and LSHIFT_EXPR. >>> >>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h >>> index 8b7b345..0aac75b 100644 >>> --- a/gcc/tree-vectorizer.h >>> +++ b/gcc/tree-vectorizer.h >>> @@ -1044,7 +1044,7 @@ extern void vect_slp_transform_bb (basic_block); >>> Additional pattern recognition functions can (and will) be added >>> in the future. */ >>> typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *); >>> -#define NUM_PATTERNS 11 >>> +#define NUM_PATTERNS 12 >>> void vect_pattern_recog (loop_vec_info, bb_vec_info); >>> >>> /* In tree-vectorizer.c. */ >>> diff --git a/gcc/tree.def b/gcc/tree.def >>> index 88c850a..31a3b64 100644 >>> --- a/gcc/tree.def >>> +++ b/gcc/tree.def >>> @@ -1146,6 +1146,15 @@ DEFTREECODE (REDUC_PLUS_EXPR, >>> "reduc_plus_expr", tcc_unary, 1) >>> arg3 = WIDEN_SUM_EXPR (tmp, arg3); */ >>> DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) >>> >>> +/* Widening sad (sum of absolute differences). >>> + The first two arguments are of type t1 which should be unsigned integer. >>> + The third argument and the result are of type t2, such that t2 is at >>> least >>> + twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to: >>> + tmp1 = WIDEN_MINUS_EXPR (arg1, arg2); >>> + tmp2 = ABS_EXPR (tmp1); >>> + arg3 = PLUS_EXPR (tmp2, arg3); */ >>> +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3) >>> + >>> /* Widening summation. >>> The first argument is of type t1. >>> The second argument is of type t2, such that t2 is at least twice