Sorry for the very late comment (was out last week)… Richard Biener <rguent...@suse.de> writes: > This enables SLP store group splitting also for loop vectorization. > For the existing testcase gcc.dg/vect/vect-complex-5.c this then > generates much better code, likewise for the PR97428 testcase. > > Both of those have a splitting opportunity splitting the group > into two equal (vector-sized) halves, still the patch enables > quite arbitrary splitting since generally the interleaving scheme > results in quite awkward code for even small groups. If any > problems surface with this it's easy to restrict the splitting > to known-good cases. Is there any additional constraints for > non-constant sized vectors? Note this interacts with vector > size iteration (but comparing interleaving cost with SLP cost > of a smaller vector size doesn't reliably pick the smaller > vector size).
Not sure about the variable-sized vector aspect. For SVE it isn't really natural to split the store itself up: I think we'd instead want to keep a unified store and blend in the stored values where necessary. E.g. rather than split: a a a a b b c c into: a a a a b b c c we'd be better off having predicated groups of the form: a a a a _ _ _ _ _ _ _ _ b b _ _ _ _ _ _ _ _ c c This is one thing on the very long todo list :-/ > @@ -2323,6 +2323,36 @@ vect_analyze_slp_instance (vec_info *vinfo, > rest, max_tree_size); > return res; > } > + > + /* For loop vectorization split into arbitrary pieces of size > 1. */ > + if (is_a <loop_vec_info> (vinfo) > + && (i > 1 && i < group_size)) > + { > + gcc_assert ((const_nunits & (const_nunits - 1)) == 0); FWIW, we have pow2p_hwi for this. > + unsigned group1_size = i; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "Splitting SLP group at stmt %u\n", i); > + > + stmt_vec_info rest = vect_split_slp_store_group (stmt_info, > + group1_size); > + /* Loop vectorization cannot handle gaps in stores, make sure > + the split group appears as strided. */ > + STMT_VINFO_STRIDED_P (rest) = 1; > + DR_GROUP_GAP (rest) = 0; > + STMT_VINFO_STRIDED_P (stmt_info) = 1; > + DR_GROUP_GAP (stmt_info) = 0; Does something undo the STMT_VINFO_STRIDED_P assignments if SLP vectorisation fails? If not, mightn't this pessimise things in that case? Realise that that won't be a concern once everything is SLP, just wondering. (Sorry if this has already been dealt with by later patches or is otherwise just noise.) Thanks, Richard > + > + bool res = vect_analyze_slp_instance (vinfo, bst_map, stmt_info, > + max_tree_size); > + if (i + 1 < group_size) > + res |= vect_analyze_slp_instance (vinfo, bst_map, > + rest, max_tree_size); > + > + return res; > + } > + > /* Even though the first vector did not all match, we might be able to > SLP > (some) of the remainder. FORNOW ignore this possibility. */ > }