[Bug middle-end/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 05 Mar 2021 00:20:45 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394


--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is a loop-carried data dependence which we can't handle (we avoid creating
those from PRE but here it appears in the source itself).  I wonder how
LLVM handles this (pre/post vectorization IL).

Specifically 'carry around variable' is something we don't handle.

Can you somehow extract a compilable testcase (with just this kernel)?

Looking at the source peeling a single iteration (to get rid of the initial
value) and then undoing the PRE, vectorizing

        for (int i = 1; i < LEN_1D; i++) {
            a[i] = (b[i] + b[i-1]) * (real_t).5;
        }

would likely result in optimal code.  The assembly from clang doesn't look
optimal to me - llvm likely materializes 'x' as temporary array, vectorizing

  x[0] = b[LEN_1D-1];
        for (int i = 0; i < LEN_1D; i++) {
            a[i] = (b[i] + x[i]) * (real_t).5;
            x[i+1] = b[i];
        }

and then somehow (like we handle OMP simd lane arrays?) uses two vectors
as a sliding window over x[].  At least the standard strathegy for
these kind of dependences is to get "rid" of them by making them data
dependences and then hope for the best.

[Bug middle-end/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

Reply via email to