Tests are still running, but I believe I've addressed all the comments. > Like Richard said, the new patterns need to be documented in md.texi > and the new tree codes need to be documented in generic.texi.
Done. > While we're using tree codes, I think we need to make the naming > consistent with other tree codes: WIDEN_PLUS_EXPR instead of > WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR. > Same idea for the VEC_* codes. Fixed. > > gcc/ChangeLog: > > > > 2020-11-12 Joel Hutton <joel.hut...@arm.com> > > > > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases > > Not that I personally care about this stuff (would love to see changelogs > go away :-)) but some nits: > > Each description is supposed to start with a capital letter and end with > a full stop (even if it's not a complete sentence). Same for the rest Fixed. > > * optabs-tree.c (optab_for_tree_code): optabs for widening > > adds,subtracts > > The line limit for changelogs is 80 characters. The entry should say > what changed, so “Handle …” or “Add case for …” or something. Fixed. > > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog > > ptatern > > typo: pattern Fixed. > > Add widening add, subtract patterns to tree-vect-patterns. > > Add aarch64 tests for patterns. > > > > fix sad > > Would be good to expand on this for the final commit message. 'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive. > > + > > + case VEC_WIDEN_SUB_HI_EXPR: > > + return (TYPE_UNSIGNED (type) > > + ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab); > > + > > + > > Nits: excess blank line at the end and excess space before the “:”s. Fixed. > > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > > OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") > > OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") > > OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") > > Looks like the current code groups signed stuff together and > unsigned stuff together, so would be good to follow that. Fixed. > Same comments as the previous patch about having a "+nosve" pragma > and about the scan-assembler-times lines. Same for the sub test. Fixed. > I am missing documentation in md.texi for the new patterns. In > particular I wonder why you need singed and unsigned variants > for the add/subtract patterns. Fixed. Signed and unsigned variants because they correspond to signed and unsigned instructions, (uaddl/uaddl2, saddl/saddl2). > The new functions should have comments before them. Can probably > just use the vect_recog_widen_mult_pattern comment as a template. Fixed. > > + case VEC_WIDEN_SUB_HI_EXPR: > > + case VEC_WIDEN_SUB_LO_EXPR: > > + case VEC_WIDEN_ADD_HI_EXPR: > > + case VEC_WIDEN_ADD_LO_EXPR: > > + return false; > > + > > I think these should get the same validity checking as > VEC_WIDEN_MULT_HI_EXPR etc. Fixed. > > --- a/gcc/tree-vect-patterns.c > > +++ b/gcc/tree-vect-patterns.c > > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo, > > of the above pattern. */ > > > > tree plus_oprnd0, plus_oprnd1; > > - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, > > - &plus_oprnd0, &plus_oprnd1)) > > + if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, > > + &plus_oprnd0, &plus_oprnd1) > > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR, > > + &plus_oprnd0, &plus_oprnd1))) > > return NULL; > > > > tree sum_type = gimple_expr_type (last_stmt); > > I think we should make: > > /* Any non-truncating sequence of conversions is OK here, since > with a successful match, the result of the ABS(U) is known to fit > within the nonnegative range of the result type. (It cannot be the > negative of the minimum signed value due to the range of the widening > MINUS_EXPR.) */ > vect_unpromoted_value unprom_abs; > plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0, > &unprom_abs); > > specific to the PLUS_EXPR case. If we look through promotions on > the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture > of signednesses involved, one on the operands of the WIDEN_ADD_EXPR > and one on its inputs. Fixed. gcc/ChangeLog: 2020-11-13 Joel Hutton <joel.hut...@arm.com> * expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. adds, subtracts. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts tree-vect-patterns.c * (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: 2020-11-13 Joel Hutton <joel.hut...@arm.com> * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
From d5e20487bbccd69e9b5ac96fef6c9df8710d0cb0 Mon Sep 17 00:00:00 2001 From: Joel Hutton <joel.hut...@arm.com> Date: Mon, 9 Nov 2020 15:44:18 +0000 Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2*S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. --- gcc/doc/generic.texi | 31 +++++++ gcc/doc/md.texi | 22 +++++ gcc/expr.c | 6 ++ gcc/optabs-tree.c | 16 ++++ gcc/optabs.def | 8 ++ .../gcc.target/aarch64/vect-widen-add.c | 92 +++++++++++++++++++ .../gcc.target/aarch64/vect-widen-sub.c | 92 +++++++++++++++++++ gcc/tree-cfg.c | 6 ++ gcc/tree-inline.c | 6 ++ gcc/tree-vect-generic.c | 4 + gcc/tree-vect-patterns.c | 31 ++++++- gcc/tree-vect-stmts.c | 15 ++- gcc/tree.def | 6 ++ 13 files changed, 331 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 7373266c69f..3d7d4b0b947 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1790,6 +1790,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @tindex VEC_WIDEN_MULT_LO_EXPR +@tindex VEC_WIDEN_PLUS_HI_EXPR +@tindex VEC_WIDEN_PLUS_LO_EXPR +@tindex VEC_WIDEN_MINUS_HI_EXPR +@tindex VEC_WIDEN_MINUS_LO_EXPR @tindex VEC_UNPACK_HI_EXPR @tindex VEC_UNPACK_LO_EXPR @tindex VEC_UNPACK_FLOAT_HI_EXPR @@ -1836,6 +1840,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the low @code{N/2} elements of the two vector are multiplied to produce the vector of @code{N/2} products. +@item VEC_WIDEN_PLUS_HI_EXPR +@itemx VEC_WIDEN_PLUS_LO_EXPR +These nodes represent widening vector addition of the high and low parts of +the two input vectors, respectively. Their operands are vectors that contain +the same number of elements (@code{N}) of the same integral type. The result +is a vector that contains half as many elements, of an integral type whose size +is twice as wide. In the case of @code{VEC_WIDEN_PLUS_HI_EXPR} the high +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. In the case of @code{VEC_WIDEN_PLUS_LO_EXPR} the low +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. + +@item VEC_WIDEN_MINUS_HI_EXPR +@itemx VEC_WIDEN_MINUS_LO_EXPR +These nodes represent widening vector subtraction of the high and low parts of +the two input vectors, respectively. Their operands are vectors that contain +the same number of elements (@code{N}) of the same integral type. The high/low +elements of the second vector are subtracted from the high/low elements of the +first. The result is a vector that contains half as many elements, of an +integral type whose size is twice as wide. In the case of +@code{VEC_WIDEN_MINUS_HI_EXPR} the high @code{N/2} elements of the second +vector are subtracted from the high @code{N/2} of the first to produce the +vector of @code{N/2} products. In the case of +@code{VEC_WIDEN_MINUS_LO_EXPR} the low @code{N/2} elements of the second +vector are subtracted from the low @code{N/2} of the first to produce the +vector of @code{N/2} products. + @item VEC_UNPACK_HI_EXPR @itemx VEC_UNPACK_LO_EXPR These nodes represent unpacking of the high and low parts of the input vector, diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 813875b973b..da8c9a283dd 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5626,6 +5626,28 @@ with N signed/unsigned elements of size S@. Operand 2 is a constant. Shift the high/low elements of operand 1, and put the N/2 results of size 2*S in the output vector (operand 0). +@cindex @code{vec_widen_saddl_hi_@var{m}} instruction pattern +@cindex @code{vec_widen_saddl_lo_@var{m}} instruction pattern +@cindex @code{vec_widen_uaddl_hi_@var{m}} instruction pattern +@cindex @code{vec_widen_uaddl_lo_@var{m}} instruction pattern +@item @samp{vec_widen_uaddl_hi_@var{m}}, @samp{vec_widen_uaddl_lo_@var{m}} +@itemx @samp{vec_widen_saddl_hi_@var{m}}, @samp{vec_widen_saddl_lo_@var{m}} +Signed/Unsigned widening add long. Operands 1 and 2 are vectors with N +signed/unsigned elements of size S@. Add the high/low elements of 1 and 2 +together, widen the resulting elements and put the N/2 results of size 2*S in +the output vector (operand 0). + +@cindex @code{vec_widen_ssubl_hi_@var{m}} instruction pattern +@cindex @code{vec_widen_ssubl_lo_@var{m}} instruction pattern +@cindex @code{vec_widen_usubl_hi_@var{m}} instruction pattern +@cindex @code{vec_widen_usubl_lo_@var{m}} instruction pattern +@item @samp{vec_widen_usubl_hi_@var{m}}, @samp{vec_widen_usubl_lo_@var{m}} +@itemx @samp{vec_widen_ssubl_hi_@var{m}}, @samp{vec_widen_ssubl_lo_@var{m}} +Signed/Unsigned widening subtract long. Operands 1 and 2 are vectors with N +signed/unsigned elements of size S@. Subtract the high/low elements of 2 from +1 and widen the resulting elements. Put the N/2 results of size 2*S in the +output vector (operand 0). + @cindex @code{mulhisi3} instruction pattern @item @samp{mulhisi3} Multiply operands 1 and 2, which have mode @code{HImode}, and store diff --git a/gcc/expr.c b/gcc/expr.c index ae16f077758..83aa63c41b5 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, target, unsignedp); return target; + case WIDEN_PLUS_EXPR: + case WIDEN_MINUS_EXPR: case WIDEN_MULT_EXPR: /* If first operand is constant, swap them. Thus the following special case checks need only @@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, return temp; } + case VEC_WIDEN_PLUS_HI_EXPR: + case VEC_WIDEN_PLUS_LO_EXPR: + case VEC_WIDEN_MINUS_HI_EXPR: + case VEC_WIDEN_MINUS_LO_EXPR: case VEC_WIDEN_MULT_HI_EXPR: case VEC_WIDEN_MULT_LO_EXPR: case VEC_WIDEN_MULT_EVEN_EXPR: diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index 4dfda756932..b797d018c84 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -170,6 +170,22 @@ optab_for_tree_code (enum tree_code code, const_tree type, return (TYPE_UNSIGNED (type) ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab); + case VEC_WIDEN_PLUS_LO_EXPR: + return (TYPE_UNSIGNED (type) + ? vec_widen_uaddl_lo_optab : vec_widen_saddl_lo_optab); + + case VEC_WIDEN_PLUS_HI_EXPR: + return (TYPE_UNSIGNED (type) + ? vec_widen_uaddl_hi_optab : vec_widen_saddl_hi_optab); + + case VEC_WIDEN_MINUS_LO_EXPR: + return (TYPE_UNSIGNED (type) + ? vec_widen_usubl_lo_optab : vec_widen_ssubl_lo_optab); + + case VEC_WIDEN_MINUS_HI_EXPR: + return (TYPE_UNSIGNED (type) + ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab); + case VEC_UNPACK_HI_EXPR: return (TYPE_UNSIGNED (type) ? vec_unpacku_hi_optab : vec_unpacks_hi_optab); diff --git a/gcc/optabs.def b/gcc/optabs.def index 78409aa1453..5607f51e6b4 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -383,6 +383,10 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a") OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a") OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a") OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a") +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") +OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") +OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") +OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") @@ -391,6 +395,10 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a") OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a") OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a") OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a") +OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") OPTAB_D (sync_add_optab, "sync_add$I$a") OPTAB_D (sync_and_optab, "sync_and$I$a") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c new file mode 100644 index 00000000000..220bd9352a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -0,0 +1,92 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -save-temps" } */ +#include <stdint.h> +#include <string.h> + +#pragma GCC target "+nosve" + +#define ARR_SIZE 1024 + +/* Should produce an uaddl */ +void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] + b[i]; + foo[i+1] = a[i+1] + b[i+1]; + foo[i+2] = a[i+2] + b[i+2]; + foo[i+3] = a[i+3] + b[i+3]; + } +} + +__attribute__((optimize (0))) +void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] + b[i]; + foo[i+1] = a[i+1] + b[i+1]; + foo[i+2] = a[i+2] + b[i+2]; + foo[i+3] = a[i+3] + b[i+3]; + } +} + +/* Should produce an saddl */ +void sadd_opt (int32_t *foo, int16_t *a, int16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] + b[i]; + foo[i+1] = a[i+1] + b[i+1]; + foo[i+2] = a[i+2] + b[i+2]; + foo[i+3] = a[i+3] + b[i+3]; + } +} + +__attribute__((optimize (0))) +void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] + b[i]; + foo[i+1] = a[i+1] + b[i+1]; + foo[i+2] = a[i+2] + b[i+2]; + foo[i+3] = a[i+3] + b[i+3]; + } +} + + +void __attribute__((optimize (0))) +init(uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE;i++) + { + a[i] = i; + b[i] = 2*i; + } +} + +int __attribute__((optimize (0))) +main() +{ + uint32_t foo_arr[ARR_SIZE]; + uint32_t bar_arr[ARR_SIZE]; + uint16_t a[ARR_SIZE]; + uint16_t b[ARR_SIZE]; + + init(a, b); + uadd_opt(foo_arr, a, b); + uadd_nonopt(bar_arr, a, b); + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) + return 1; + sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b); + sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b); + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) + return 1; + return 0; +} + +/* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ +/* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ +/* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ +/* { dg-final { scan-assembler-times {\tsaddl2\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c new file mode 100644 index 00000000000..a2bed63affb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -0,0 +1,92 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -save-temps" } */ +#include <stdint.h> +#include <string.h> + +#pragma GCC target "+nosve" + +#define ARR_SIZE 1024 + +/* Should produce an usubl */ +void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] - b[i]; + foo[i+1] = a[i+1] - b[i+1]; + foo[i+2] = a[i+2] - b[i+2]; + foo[i+3] = a[i+3] - b[i+3]; + } +} + +__attribute__((optimize (0))) +void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] - b[i]; + foo[i+1] = a[i+1] - b[i+1]; + foo[i+2] = a[i+2] - b[i+2]; + foo[i+3] = a[i+3] - b[i+3]; + } +} + +/* Should produce an ssubl */ +void ssub_opt (int32_t *foo, int16_t *a, int16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] - b[i]; + foo[i+1] = a[i+1] - b[i+1]; + foo[i+2] = a[i+2] - b[i+2]; + foo[i+3] = a[i+3] - b[i+3]; + } +} + +__attribute__((optimize (0))) +void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b) +{ + for( int i = 0; i < ARR_SIZE - 3;i=i+4) + { + foo[i] = a[i] - b[i]; + foo[i+1] = a[i+1] - b[i+1]; + foo[i+2] = a[i+2] - b[i+2]; + foo[i+3] = a[i+3] - b[i+3]; + } +} + + +void __attribute__((optimize (0))) +init(uint16_t *a, uint16_t *b) +{ + for( int i = 0; i < ARR_SIZE;i++) + { + a[i] = i; + b[i] = 2*i; + } +} + +int __attribute__((optimize (0))) +main() +{ + uint32_t foo_arr[ARR_SIZE]; + uint32_t bar_arr[ARR_SIZE]; + uint16_t a[ARR_SIZE]; + uint16_t b[ARR_SIZE]; + + init(a, b); + usub_opt(foo_arr, a, b); + usub_nonopt(bar_arr, a, b); + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) + return 1; + ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b); + ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b); + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) + return 1; + return 0; +} + +/* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ +/* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ +/* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ +/* { dg-final { scan-assembler-times {\tssubl2\t} 1} } */ diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 5139f111fec..aaf390bda42 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3885,6 +3885,8 @@ verify_gimple_assign_binary (gassign *stmt) return false; } + case WIDEN_PLUS_EXPR: + case WIDEN_MINUS_EXPR: case PLUS_EXPR: case MINUS_EXPR: { @@ -4005,6 +4007,10 @@ verify_gimple_assign_binary (gassign *stmt) return false; } + case VEC_WIDEN_MINUS_HI_EXPR: + case VEC_WIDEN_MINUS_LO_EXPR: + case VEC_WIDEN_PLUS_HI_EXPR: + case VEC_WIDEN_PLUS_LO_EXPR: case VEC_WIDEN_MULT_HI_EXPR: case VEC_WIDEN_MULT_LO_EXPR: case VEC_WIDEN_MULT_EVEN_EXPR: diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 32424b169c7..d9814bd10d3 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -4224,6 +4224,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights, case REALIGN_LOAD_EXPR: + case WIDEN_PLUS_EXPR: + case WIDEN_MINUS_EXPR: case WIDEN_SUM_EXPR: case WIDEN_MULT_EXPR: case DOT_PROD_EXPR: @@ -4232,6 +4234,10 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights, case WIDEN_MULT_MINUS_EXPR: case WIDEN_LSHIFT_EXPR: + case VEC_WIDEN_PLUS_HI_EXPR: + case VEC_WIDEN_PLUS_LO_EXPR: + case VEC_WIDEN_MINUS_HI_EXPR: + case VEC_WIDEN_MINUS_LO_EXPR: case VEC_WIDEN_MULT_HI_EXPR: case VEC_WIDEN_MULT_LO_EXPR: case VEC_WIDEN_MULT_EVEN_EXPR: diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index d7bafa77134..23bc1cb04b7 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -2118,6 +2118,10 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi, arguments, not the widened result. VEC_UNPACK_FLOAT_*_EXPR is calculated in the same way above. */ if (code == WIDEN_SUM_EXPR + || code == VEC_WIDEN_PLUS_HI_EXPR + || code == VEC_WIDEN_PLUS_LO_EXPR + || code == VEC_WIDEN_MINUS_HI_EXPR + || code == VEC_WIDEN_MINUS_LO_EXPR || code == VEC_WIDEN_MULT_HI_EXPR || code == VEC_WIDEN_MULT_LO_EXPR || code == VEC_WIDEN_MULT_EVEN_EXPR diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index f68a87e05ed..79b521aa436 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -1148,7 +1148,7 @@ vect_recog_sad_pattern (vec_info *vinfo, /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom[2]; - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR, + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR, false, 2, unprom, &half_type)) return NULL; @@ -1262,6 +1262,29 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, "vect_recog_widen_mult_pattern"); } +/* Try to detect addition on widened inputs, converting PLUS_EXPR + to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + +static gimple * +vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, + tree *type_out) +{ + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + PLUS_EXPR, WIDEN_PLUS_EXPR, false, + "vect_recog_widen_plus_pattern"); +} + +/* Try to detect addition on widened inputs, converting SUB_EXPR + to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ +static gimple * +vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, + tree *type_out) +{ + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + MINUS_EXPR, WIDEN_MINUS_EXPR, false, + "vect_recog_widen_minus_pattern"); +} + /* Function vect_recog_pow_pattern Try to find the following pattern: @@ -1978,7 +2001,7 @@ vect_recog_average_pattern (vec_info *vinfo, vect_unpromoted_value unprom[3]; tree new_type; unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR, - PLUS_EXPR, false, 3, + WIDEN_PLUS_EXPR, false, 3, unprom, &new_type); if (nops == 0) return NULL; @@ -5249,7 +5272,9 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { of mask conversion that are needed for gather and scatter internal functions. */ { vect_recog_gather_scatter_pattern, "gather_scatter" }, - { vect_recog_mask_conversion_pattern, "mask_conversion" } + { vect_recog_mask_conversion_pattern, "mask_conversion" }, + { vect_recog_widen_plus_pattern, "widen_plus" }, + { vect_recog_widen_minus_pattern, "widen_minus" }, }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 2c7a8a70913..25a8474c774 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -4570,6 +4570,8 @@ vectorizable_conversion (vec_info *vinfo, if (!CONVERT_EXPR_CODE_P (code) && code != FIX_TRUNC_EXPR && code != FLOAT_EXPR + && code != WIDEN_PLUS_EXPR + && code != WIDEN_MINUS_EXPR && code != WIDEN_MULT_EXPR && code != WIDEN_LSHIFT_EXPR) return false; @@ -4615,7 +4617,8 @@ vectorizable_conversion (vec_info *vinfo, if (op_type == binary_op) { - gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR); + gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR + || code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR); op1 = gimple_assign_rhs2 (stmt); tree vectype1_in; @@ -11534,6 +11537,16 @@ supportable_widening_operation (vec_info *vinfo, c2 = VEC_WIDEN_LSHIFT_HI_EXPR; break; + case WIDEN_PLUS_EXPR: + c1 = VEC_WIDEN_PLUS_LO_EXPR; + c2 = VEC_WIDEN_PLUS_HI_EXPR; + break; + + case WIDEN_MINUS_EXPR: + c1 = VEC_WIDEN_MINUS_LO_EXPR; + c2 = VEC_WIDEN_MINUS_HI_EXPR; + break; + CASE_CONVERT: c1 = VEC_UNPACK_LO_EXPR; c2 = VEC_UNPACK_HI_EXPR; diff --git a/gcc/tree.def b/gcc/tree.def index 6c53fe1bf67..ffbe00cf79f 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1359,6 +1359,8 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3) the first argument from type t1 to type t2, and then shifting it by the second argument. */ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) +DEFTREECODE (WIDEN_PLUS_EXPR, "widen_plus_expr", tcc_binary, 2) +DEFTREECODE (WIDEN_MINUS_EXPR, "widen_minus_expr", tcc_binary, 2) /* Widening vector multiplication. The two operands are vectors with N elements of size S. Multiplying the @@ -1423,6 +1425,10 @@ DEFTREECODE (VEC_PACK_FLOAT_EXPR, "vec_pack_float_expr", tcc_binary, 2) */ DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2) DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2) +DEFTREECODE (VEC_WIDEN_PLUS_HI_EXPR, "widen_plus_hi_expr", tcc_binary, 2) +DEFTREECODE (VEC_WIDEN_PLUS_LO_EXPR, "widen_plus_lo_expr", tcc_binary, 2) +DEFTREECODE (VEC_WIDEN_MINUS_HI_EXPR, "widen_minus_hi_expr", tcc_binary, 2) +DEFTREECODE (VEC_WIDEN_MINUS_LO_EXPR, "widen_minus_lo_expr", tcc_binary, 2) /* PREDICT_EXPR. Specify hint for branch prediction. The PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the -- 2.17.1