https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:1b3bff737b2d5a7dc0d5977b77200c785fc53f01

commit r15-5745-g1b3bff737b2d5a7dc0d5977b77200c785fc53f01
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Thu Nov 28 10:23:14 2024 +0000

    middle-end: rework vectorizable_store to iterate over single index
[PR117557]

    The testcase

    #include <stdint.h>
    #include <string.h>

    #define N 8
    #define L 8

    void f(const uint8_t * restrict seq1,
           const uint8_t *idx, uint8_t *seq_out) {
      for (int i = 0; i < L; ++i) {
        uint8_t h = idx[i];
        memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
      }
    }

    compiled at -O3 -mcpu=neoverse-n1+sve

    miscompiles to:

        ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
        ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
        st1w    z29.s, p7, [x24, z12.s, sxtw]
        st1w    z31.s, p7, [x24, z12.s, sxtw]

    rather than

        ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
        ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
        st1w    z29.s, p7, [x24, z12.s, sxtw]
        addvl   x3, x24, #2
        st1w    z31.s, p3, [x3, z12.s, sxtw]

    Where two things go wrong, the wrong mask is used and the address pointers
to
    the stores are wrong.

    This issue is happening because the codegen loop in vectorizable_store is a
    nested loop where in the outer loop we iterate over ncopies and in the
inner
    loop we loop over vec_num.

    For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
    determined by only the outerloop index and the pointer address is only
updated
    in the outer loop.

    As such for SLP we always use the same predicate and the same memory
location.
    This patch flattens the two loops and instead iterates over ncopies *
vec_num
    and simplified the indexing.

    This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the
error
    moves somewhere else.  I will look at that next but fixes some other
libraries
    that also started failing.

    gcc/ChangeLog:

            PR tree-optimization/117557
            * tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
            vec_num loops.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/117557
            * gcc.target/aarch64/pr117557.c: New test.

Reply via email to