On Fri, Oct 16, 2015 at 5:25 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> This lets the vectorizer handle some simple strides expressed using left-shift
> rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
> been handled).
>
> This patch does *not* handle the general case of shifts - neither a[i << j]
> nor a[1 << i] will be handled; that would be a significantly bigger patch
> (probably duplicating or generalizing much of chrec_fold_multiply and
> chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only
> be applicable to machines with gather-load support.
>
> Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on 
> x86_64.
>
> Is this OK for trunk?
>
> gcc/ChangeLog:
>
>         PR tree-optimization/65963
>         * tree-scalar-evolution.c (interpret_rhs_expr): Handle some 
> LSHIFT_EXPRs
>         as equivalent MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
> ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.c                      | 18 +++++++++++++
>  2 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 0000000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963.  */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> +  for (int i = 0; i < N; i++)
> +    out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  check_vect ();
> +  for (int i = 0; i < 2*N; i++)
> +    {
> +      in[i] = i;
> +      __asm__ volatile ("" : : : "memory");
> +    }
> +  loop ();
> +  __asm__ volatile ("" : : : "memory");
> +  for (int i = 0; i < N; i++)
> +    {
> +      if (out[i] != i*2 + 7)
> +       abort ();
> +    }
> +  return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..e478b0e 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple 
> *at_stmt,
>        break;
>
>      case MULT_EXPR:
> +    case LSHIFT_EXPR:
> +      /* Handle A<<B as A * (1<<B).  */
>        chrec1 = analyze_scalar_evolution (loop, rhs1);
>        chrec2 = analyze_scalar_evolution (loop, rhs2);
>        chrec1 = chrec_convert (type, chrec1, at_stmt);
>        chrec2 = chrec_convert (type, chrec2, at_stmt);
>        chrec1 = instantiate_parameters (loop, chrec1);
>        chrec2 = instantiate_parameters (loop, chrec2);
> +      if (code == LSHIFT_EXPR)
> +       {
> +         /* Do the shift in the larger size, as in e.g. (long) << (int)32,
> +            we must do 1<<32 as a long or we'd overflow.  */

Err, you should always do the shift in the type of rhs1.  You should also
avoid the chrec_convert of rhs2 above for shifts.  I think globbing
shifts and multiplies together doesn't make the code any clearer.

Richard.

> +         tree type = TREE_TYPE (chrec2);
> +         if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type))
> +           type = TREE_TYPE (chrec1);
> +         if (TYPE_PRECISION (type) == 0)
> +           {
> +             res = chrec_dont_know;
> +             break;
> +           }
> +         chrec2 = fold_build2 (LSHIFT_EXPR, type,
> +                               build_int_cst (type, 1),
> +                               chrec2);
> +       }
>        res = chrec_fold_multiply (type, chrec1, chrec2);
>        break;
>
> --
> 1.9.1
>

Reply via email to