The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows that we sometimes fail to use store-lanes even though it should be profitable. We're currently relying on vect_slp_prefer_store_lanes_p at the point we run into the first SLP discovery mismatch with obviously limited information. For the case at hand we have 3, 5 or 7 lanes of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the new group size is 1. The heuristic says that might be an OK split given the rest is a multiple of the vector lanes. Now we continue discovery but in the end mismatches result in uniformly single-lane SLP instances which we can handle via interleaving but of course are prime candidates for store-lanes. The following patch re-assesses with the extra knowledge now just relying on the fact whether the target supports store-lanes for the given group size.
Bootstrap and regtest running on x86_64-unknown-linux-gnu; I'll be watching the CI for fallout. Richard. PR tree-optimization/115372 * tree-vect-slp.cc (vect_build_slp_instance): Compute the uniform, if, number of lanes of the RHS sub-graphs feeding the store and if uniformly one, use store-lanes if the target supports that. --- gcc/tree-vect-slp.cc | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9c817de18bd..fc918d47c00 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -3957,6 +3957,7 @@ vect_build_slp_instance (vec_info *vinfo, /* Calculate the unrolling factor based on the smallest type. */ poly_uint64 unrolling_factor = 1; + unsigned int rhs_common_nlanes = 0; unsigned int start = 0, end = i; while (start < group_size) { @@ -3978,6 +3979,10 @@ vect_build_slp_instance (vec_info *vinfo, calculate_unrolling_factor (max_nunits, end - start)); rhs_nodes.safe_push (node); + if (start == 0) + rhs_common_nlanes = SLP_TREE_LANES (node); + else if (rhs_common_nlanes != SLP_TREE_LANES (node)) + rhs_common_nlanes = 0; start = end; if (want_store_lanes || force_single_lane) end = start + 1; @@ -4015,6 +4020,15 @@ vect_build_slp_instance (vec_info *vinfo, } } + /* Now re-assess whether we want store lanes in case the + discovery ended up producing all single-lane RHSs. */ + if (rhs_common_nlanes == 1 + && vect_store_lanes_supported (SLP_TREE_VECTYPE (rhs_nodes[0]), + group_size, + SLP_TREE_CHILDREN + (rhs_nodes[0]).length () != 1)) + want_store_lanes = true; + /* Now we assume we can build the root SLP node from all stores. */ if (want_store_lanes) { -- 2.43.0