https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67682
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-09-23 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Interestingly it works on x86_64. The key is of course interleaving detection which has to split the store group properly. Ah, I have a local patch: Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c (revision 228010) +++ gcc/tree-vect-data-refs.c (working copy) @@ -2610,6 +2636,10 @@ vect_analyze_data_ref_accesses (loop_vec != type_size_a)) break; + if (!DR_IS_READ (dra) + && (init_b - init_a) >= 16) + break; + /* If the step (if not zero or non-constant) is greater than the difference between data-refs' inits this splits groups into suitable sizes. */ so yes, the key is to split the group according to the active vector size (so the above is clearly a hack). A better place to handle this is vect_analyze_slp_instance which when vect_build_slp_tree fails should have an idea if splitting is worth (based on 'matches'). It would also need to split load groups for, say void test (int*__restrict a, int*__restrict b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; a[4] = b[4] + 1; a[5] = b[5] + 2; a[6] = b[6] + 3; a[7] = b[7] + 4; } also the splitting is probably only a good idea for BB SLP (well, not sure). It would need to re-invoke itself for all the split pieces. So the hack above is certainly easier but we don't know the choosen vector size yet at the point of that analysis. And BB vectorization could use different vector sizes for different SLP instances easily.