On Fri, Oct 16, 2015 at 5:25 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > This lets the vectorizer handle some simple strides expressed using left-shift > rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have > been handled). > > This patch does *not* handle the general case of shifts - neither a[i << j] > nor a[1 << i] will be handled; that would be a significantly bigger patch > (probably duplicating or generalizing much of chrec_fold_multiply and > chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only > be applicable to machines with gather-load support. > > Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on > x86_64. > > Is this OK for trunk? > > gcc/ChangeLog: > > PR tree-optimization/65963 > * tree-scalar-evolution.c (interpret_rhs_expr): Handle some > LSHIFT_EXPRs > as equivalent MULT_EXPRs. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-strided-shift-1.c: New. > --- > gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 > ++++++++++++++++++++++++ > gcc/tree-scalar-evolution.c | 18 +++++++++++++ > 2 files changed, 51 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c > b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c > new file mode 100644 > index 0000000..b1ce2ec > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c > @@ -0,0 +1,33 @@ > +/* PR tree-optimization/65963. */ > +#include "tree-vect.h" > + > +#define N 512 > + > +int in[2*N], out[N]; > + > +__attribute__ ((noinline)) void > +loop (void) > +{ > + for (int i = 0; i < N; i++) > + out[i] = in[i << 1] + 7; > +} > + > +int > +main (int argc, char **argv) > +{ > + check_vect (); > + for (int i = 0; i < 2*N; i++) > + { > + in[i] = i; > + __asm__ volatile ("" : : : "memory"); > + } > + loop (); > + __asm__ volatile ("" : : : "memory"); > + for (int i = 0; i < N; i++) > + { > + if (out[i] != i*2 + 7) > + abort (); > + } > + return 0; > +} > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 > "vect" { target { vect_strided2 } } } } */ > diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c > index 0753bf3..e478b0e 100644 > --- a/gcc/tree-scalar-evolution.c > +++ b/gcc/tree-scalar-evolution.c > @@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple > *at_stmt, > break; > > case MULT_EXPR: > + case LSHIFT_EXPR: > + /* Handle A<<B as A * (1<<B). */ > chrec1 = analyze_scalar_evolution (loop, rhs1); > chrec2 = analyze_scalar_evolution (loop, rhs2); > chrec1 = chrec_convert (type, chrec1, at_stmt); > chrec2 = chrec_convert (type, chrec2, at_stmt); > chrec1 = instantiate_parameters (loop, chrec1); > chrec2 = instantiate_parameters (loop, chrec2); > + if (code == LSHIFT_EXPR) > + { > + /* Do the shift in the larger size, as in e.g. (long) << (int)32, > + we must do 1<<32 as a long or we'd overflow. */
Err, you should always do the shift in the type of rhs1. You should also avoid the chrec_convert of rhs2 above for shifts. I think globbing shifts and multiplies together doesn't make the code any clearer. Richard. > + tree type = TREE_TYPE (chrec2); > + if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type)) > + type = TREE_TYPE (chrec1); > + if (TYPE_PRECISION (type) == 0) > + { > + res = chrec_dont_know; > + break; > + } > + chrec2 = fold_build2 (LSHIFT_EXPR, type, > + build_int_cst (type, 1), > + chrec2); > + } > res = chrec_fold_multiply (type, chrec1, chrec2); > break; > > -- > 1.9.1 >