https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69174
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok, so we have an interleaving store of size 3 (and thus also the SLP group size is 3) but an interleaved load of size 4 (with gaps). Ideally we'd not treat that load as interleaved but we do. The size 3 SLP requires unrolling 8 times (8 * 3 -> 24 elements). The grouped load is strided and in this case ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); if (slp_perm) dr_chain.create (ncopies); is off. See /* For SLP permutation support we need to load the whole group, not only the number of vector stmts the permutation result fits in. */ if (slp_perm) vec_num = (group_size * vf + nunits - 1) / nunits; else vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);