https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118464

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
           Priority|P3                          |P1

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, so we have VF == 2 but vectype vector(4) int.  That's because we fail to
see the load in question is only used with a load permutation { 0 0 }.  That
said,

      /* Calculate the number of vectors read per vector iteration.  If
         it is a power of two, multiply through to get the required
         alignment in bytes.  Otherwise, fail analysis since alignment
         peeling wouldn't work in such a case.  */
      poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
        num_scalars *= DR_GROUP_SIZE (stmt_info);

does not consider SLP.  That means the logic

      auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
      if (!pow2p_hwi (num_vectors))
        {

is flawed since it assumes we are doing VMAT_CONTIGUOUS vectorization
without any permute?

It looks like it might have been better to not make alignment analysis
fail but instead mark the DR so vectorizable_load/store can fail when
the first element of each loaded vector isn't aligned and we'd load
excess elements outside of the whole alignment boundary (even if it
covers multiple vectors)?

Currently nothing ensures that vectorizable_load will, in the end emit
code the above check expects.

That said - the above code doesn't deal with the case of needing "half"
of a vector as in this case (but vectorizable_load might emit a full
vector load after all, then trying to peel for gaps eventually, if not
aligned appropriately).

Reply via email to