Richard Biener <richard.guent...@gmail.com> writes: > On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> GCC 6 and 7 would vectorise: >> >> void >> f (unsigned long incx, unsigned long incy, >> float *restrict dx, float *restrict dy) >> { >> unsigned long ix = 0, iy = 0; >> for (unsigned long i = 0; i < 512; ++i) >> { >> dy[iy] += dx[ix]; >> ix += incx; >> iy += incy; >> } >> } >> >> without first proving that incy is nonzero. This is a regression from >> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >> on whether incy is zero, but that's obviously too invasive to backport. >> This patch instead bails out for non-constant steps in the place that >> trunk would try a check for zeroness. >> >> I did wonder about trying to use range information to prove nonzeroness >> for SSA_NAMEs, but that would be entirely new code and didn't seem >> suitable for a release branch. >> >> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >> to trunk too. > > Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)? > That seems what trunk is doing (just look at dr_zero_step_indicator of dra). > Even when not using range-info I think that using ! > tree_expr_nonzero_p (DR_STEP (dra)) > is more to the point of the issue we're fixing -- that also would catch > integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your > patch wouldn't so it looks like a more "complete" fix.
OK. > Last but not least trunk and your patch guards all this by > !loop->force_vectorize. > But force_vectorize doesn't give any such guarantee that step isn't > zero so I wonder > why we deliberately choose to possibly miscompile stuff here? This was based on the pre-existing: if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); /* Allow references with zero step for outer loops marked with pragma omp simd only - it guarantees absence of loop-carried dependencies between inner loop iterations. */ if (!loop->force_vectorize) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "zero step in inner loop of nest\n"); return false; } } AIUI #pragma omp simd really does guarantee that iterations of the loop can be executed concurrently (up to the limit given by safelen if present). So something like: #pragma omp simd for (int i = 0; i < n; ++i) a[i * step] += 1; would be incorrect for step==0. (#pragma ordered simd forces things to be executed in order where necessary.) Thanks, Richard > Thus I'd like to see a simpler > > + if (! tree_expr_nonzero_p (DR_STEP (dra))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "step could be zero.\n"); > + return true; > + } > > if you think that works out and also the force_vectorize guard removed from > the > trunk version. > > OK with that change. > > Thanks, > Richard. > >> Richard >> >> >> 2018-02-28 Richard Sandiford <richard.sandif...@linaro.org> >> >> gcc/ >> PR tree-optimization/84485 >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return >> true for zero dependence distances if either step is variable, and if >> there is no metadata that guarantees correctness. >> >> gcc/testsuite/ >> PR tree-optimization/84485 >> * gcc.dg/vect/pr84485.c: New test. >> >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 >> +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 >> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct >> } >> } >> >> + if (!loop->force_vectorize >> + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST >> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "step could be zero.\n"); >> + return true; >> + } >> + >> continue; >> } >> >> Index: gcc/testsuite/gcc.dg/vect/pr84485.c >> =================================================================== >> --- /dev/null 2018-02-26 10:26:41.624847060 +0000 >> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 >> @@ -0,0 +1,34 @@ >> +/* { dg-do run } */ >> + >> +#include "tree-vect.h" >> + >> +#define N 256 >> + >> +void __attribute__ ((noinline, noclone)) >> +f (unsigned long incx, unsigned long incy, >> + float *restrict dx, float *restrict dy) >> +{ >> + unsigned long ix = 0, iy = 0; >> + for (unsigned long i = 0; i < N; ++i) >> + { >> + dy[iy] += dx[ix]; >> + ix += incx; >> + iy += incy; >> + } >> +} >> + >> +float a = 0.0; >> +float b[N]; >> + >> +int >> +main (void) >> +{ >> + check_vect (); >> + >> + for (int i = 0; i < N; ++i) >> + b[i] = i; >> + f (1, 0, b, &a); >> + if (a != N * (N - 1) / 2) >> + __builtin_abort (); >> + return 0; >> +}