https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-reduction             |
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org

--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Testcase:

#include <stdint.h>
#include <string.h>

#define N 8
#define L 8

void f(const uint8_t * restrict seq1,
       const uint8_t *idx, uint8_t *seq_out) {
  for (int i = 0; i < L; ++i) {
    uint8_t h = idx[i];
    memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
  }
}

compiled at -O3 -mcpu=neoverse-n1+sve

miscompiles to:


  vect_patt_26.9_89 = [vec_unpack_lo_expr] vect_patt_27.8_88;
  vect_patt_26.9_90 = [vec_unpack_hi_expr] vect_patt_27.8_88;
  vect_patt_25.10_94 = .MASK_GATHER_LOAD (_91, vect_patt_26.9_89, 1, { 0, ...
}, loop_mask_92, { 0, ... });
  vect_patt_25.11_95 = .MASK_GATHER_LOAD (_91, vect_patt_26.9_90, 1, { 0, ...
}, loop_mask_93, { 0, ... });
  .MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.10_94,
loop_mask_92);
  .MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.11_95,
loop_mask_92);

rather than

  vect_patt_26.9_90 = [vec_unpack_lo_expr] vect_patt_27.8_89;
  vect_patt_26.9_91 = [vec_unpack_hi_expr] vect_patt_27.8_89;
  vect_patt_25.10_95 = .MASK_GATHER_LOAD (_92, vect_patt_26.9_90, 1, { 0, ...
}, loop_mask_93);
  vect_patt_25.11_96 = .MASK_GATHER_LOAD (_92, vect_patt_26.9_91, 1, { 0, ...
}, loop_mask_94);
  .MASK_SCATTER_STORE (seq_out_15(D), { 0, 8, 16, ... }, 1, vect_patt_25.10_95,
loop_mask_93);
  vectp_seq_out.12_100 = seq_out_15(D) + POLY_INT_CST [32, 32];
  .MASK_SCATTER_STORE (vectp_seq_out.12_100, { 0, 8, 16, ... }, 1,
vect_patt_25.11_96, loop_mask_94);

This happens because the index passed to vect_get_loop_mask is wrong for SLP as
Richi suspected and dataref_ptr is wrong because it's being treated as a
constant inside the vec_num loop. i.e. it thinks for SLP every store is to the
same location.

The bump_vector_ptr call needs to be inside the inner loop as well or the inner
loop flattened into the outer one which then iterates over ncopies * vec_num.

Testing a patch. So mine.

Reply via email to