The following enables vectorization of
int bar (unsigned int *x) { int sum = 0; for (int i = 0; i < 32; ++i) sum += x[i]; return sum; } which is currently not done because the loop has a conversion to unsigned int for 'sum' for doing the addition part of the reduction. That can now easily be relaxed after the recent refactorings in reduction vectorization support. There's more to actually fix PR65930 (and IIRC the case occuring in x264), namely fix the SLP reduction case. I'm working on that right now. As you can see the testsuite has a few instances of the above thus I refrained from adding another testcase. Bootstraped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2019-10-23 Richard Biener <rguent...@suse.de> PR tree-optimization/65930 * tree-vect-loop.c (check_reduction_path): Allow conversions that only change the sign. (vectorizable_reduction): Relax latch def stmts we handle further. * gcc.dg/vect/vect-reduc-2char-big-array.c: Adjust. * gcc.dg/vect/vect-reduc-2char.c: Likewise. * gcc.dg/vect/vect-reduc-2short.c: Likewise. * gcc.dg/vect/vect-reduc-dot-s8b.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-2c.c: Likewise. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 277312) +++ gcc/tree-vect-loop.c (working copy) @@ -2695,7 +2695,11 @@ pop: if (gimple_assign_rhs2 (use_stmt) == op) neg = ! neg; } - if (*code == ERROR_MARK) + if (CONVERT_EXPR_CODE_P (use_code) + && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (use_stmt)), + TREE_TYPE (gimple_assign_rhs1 (use_stmt)))) + ; + else if (*code == ERROR_MARK) *code = use_code; else if (use_code != *code) { @@ -5692,19 +5696,6 @@ vectorizable_reduction (stmt_vec_info st which is defined by the loop-header-phi. */ gassign *stmt = as_a <gassign *> (stmt_info->stmt); - switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt))) - { - case GIMPLE_BINARY_RHS: - case GIMPLE_TERNARY_RHS: - break; - - case GIMPLE_UNARY_RHS: - case GIMPLE_SINGLE_RHS: - return false; - - default: - gcc_unreachable (); - } enum tree_code code = gimple_assign_rhs_code (stmt); int op_type = TREE_CODE_LENGTH (code); Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c (revision 277312) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c (working copy) @@ -62,4 +62,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c (revision 277312) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c (working copy) @@ -46,4 +46,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c (revision 277312) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c (working copy) @@ -45,4 +45,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c (revision 277312) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c (working copy) @@ -12,12 +12,6 @@ signed char Y[N] __attribute__ ((__align /* char->short->short dot product. The dot-product pattern should be detected. - The reduction is currently not vectorized becaus of the signed->unsigned->signed - casts, since this patch: - - 2005-12-26 Kazu Hirata <k...@codesourcery.com> - - PR tree-optimization/25125 When the dot-product is detected, the loop should be vectorized on vect_sdot_qi targets (targets that support dot-product of signed char). @@ -60,5 +54,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 "vect" { xfail *-*-* } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2c.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2c.c (revision 277312) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2c.c (working copy) @@ -21,6 +21,8 @@ foo () 2005-12-26 Kazu Hirata <k...@codesourcery.com> PR tree-optimization/25125 + + but we still handle the reduction. */ for (i = 0; i < N; i++) @@ -43,5 +45,4 @@ main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 1 "vect" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! vect_widen_sum_qi_to_hi } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */