https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118464
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Priority|P3 |P1 --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, so we have VF == 2 but vectype vector(4) int. That's because we fail to see the load in question is only used with a load permutation { 0 0 }. That said, /* Calculate the number of vectors read per vector iteration. If it is a power of two, multiply through to get the required alignment in bytes. Otherwise, fail analysis since alignment peeling wouldn't work in such a case. */ poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) num_scalars *= DR_GROUP_SIZE (stmt_info); does not consider SLP. That means the logic auto num_vectors = vect_get_num_vectors (num_scalars, vectype); if (!pow2p_hwi (num_vectors)) { is flawed since it assumes we are doing VMAT_CONTIGUOUS vectorization without any permute? It looks like it might have been better to not make alignment analysis fail but instead mark the DR so vectorizable_load/store can fail when the first element of each loaded vector isn't aligned and we'd load excess elements outside of the whole alignment boundary (even if it covers multiple vectors)? Currently nothing ensures that vectorizable_load will, in the end emit code the above check expects. That said - the above code doesn't deal with the case of needing "half" of a vector as in this case (but vectorizable_load might emit a full vector load after all, then trying to peel for gaps eventually, if not aligned appropriately).