On Fri, Mar 2, 2018 at 3:12 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> GCC 6 and 7 would vectorise: >>>>> >>>>> void >>>>> f (unsigned long incx, unsigned long incy, >>>>> float *restrict dx, float *restrict dy) >>>>> { >>>>> unsigned long ix = 0, iy = 0; >>>>> for (unsigned long i = 0; i < 512; ++i) >>>>> { >>>>> dy[iy] += dx[ix]; >>>>> ix += incx; >>>>> iy += incy; >>>>> } >>>>> } >>>>> >>>>> without first proving that incy is nonzero. This is a regression from >>>>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>>>> on whether incy is zero, but that's obviously too invasive to backport. >>>>> This patch instead bails out for non-constant steps in the place that >>>>> trunk would try a check for zeroness. >>>>> >>>>> I did wonder about trying to use range information to prove nonzeroness >>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>>>> suitable for a release branch. >>>>> >>>>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>>>> to trunk too. >>>> >>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or >>>> DR_STEP (drb)? >>>> That seems what trunk is doing (just look at dr_zero_step_indicator of >>>> dra). >>>> Even when not using range-info I think that using ! >>>> tree_expr_nonzero_p (DR_STEP (dra)) >>>> is more to the point of the issue we're fixing -- that also would catch >>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but >>>> your >>>> patch wouldn't so it looks like a more "complete" fix. >>> >>> OK. >>> >>>> Last but not least trunk and your patch guards all this by >>>> !loop->force_vectorize. >>>> But force_vectorize doesn't give any such guarantee that step isn't >>>> zero so I wonder >>>> why we deliberately choose to possibly miscompile stuff here? >>> >>> This was based on the pre-existing: >>> >>> if (loop_vinfo && integer_zerop (step)) >>> { >>> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; >>> if (!nested_in_vect_loop_p (loop, stmt)) >>> return DR_IS_READ (dr); >>> /* Allow references with zero step for outer loops marked >>> with pragma omp simd only - it guarantees absence of >>> loop-carried dependencies between inner loop iterations. */ >>> if (!loop->force_vectorize) >>> { >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> "zero step in inner loop of nest\n"); >>> return false; >>> } >>> } >>> >>> AIUI #pragma omp simd really does guarantee that iterations of >>> the loop can be executed concurrently (up to the limit given by >>> safelen if present). So something like: >>> >>> #pragma omp simd >>> for (int i = 0; i < n; ++i) >>> a[i * step] += 1; >>> >>> would be incorrect for step==0. (#pragma ordered simd forces >>> things to be executed in order where necessary.) >> >> But then we should check safelen, not force_vectorize... > > OK. Following on from irc, this version uses expr_not_equal_to > instead of tree_expr_nonzero_p. It also adds safelen to the existing > use of force_vectorize (which seemed safer than replacing it). > > Tested on aarch64-linux-gnu. OK to install?
Ok. Richard. > Richard > > > 2018-02-28 Richard Sandiford <richard.sandif...@linaro.org> > > gcc/ > PR tree-optimization/84485 > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return > true for zero dependence distances if the step might be zero, > and if there is no metadata that guarantees correctness. > (vect_analyze_data_ref_access): Check safelen as well as > force_vectorize. > > gcc/testsuite/ > PR tree-optimization/84485 > * gcc.dg/vect/pr84485.c: New test. > > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2018-02-28 14:19:22.326678337 +0000 > +++ gcc/tree-vect-data-refs.c 2018-03-02 13:28:06.339321095 +0000 > @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct > } > } > > + unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra))); > + if (loop->safelen < 2 > + && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "step could be zero.\n"); > + return true; > + } > + > continue; > } > > @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat > /* Allow references with zero step for outer loops marked > with pragma omp simd only - it guarantees absence of > loop-carried dependencies between inner loop iterations. */ > - if (!loop->force_vectorize) > + if (!loop->force_vectorize || loop->safelen < 2) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > Index: gcc/testsuite/gcc.dg/vect/pr84485.c > =================================================================== > --- /dev/null 2018-03-01 08:17:49.562264353 +0000 > +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000 > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > + > +#include "tree-vect.h" > + > +#define N 256 > + > +void __attribute__ ((noinline, noclone)) > +f (unsigned long incx, unsigned long incy, > + float *restrict dx, float *restrict dy) > +{ > + unsigned long ix = 0, iy = 0; > + for (unsigned long i = 0; i < N; ++i) > + { > + dy[iy] += dx[ix]; > + ix += incx; > + iy += incy; > + } > +} > + > +float a = 0.0; > +float b[N]; > + > +int > +main (void) > +{ > + check_vect (); > + > + for (int i = 0; i < N; ++i) > + b[i] = i; > + f (1, 0, b, &a); > + if (a != N * (N - 1) / 2) > + __builtin_abort (); > + return 0; > +}