This completes (functionality wise) the work bringing non-SLP "interleaving" group vectorization to SLP. I've chosen to keep [1/5] exactly the same as posted last time but it might confuse things as [2/5] largely refactors it.
The series is how it worked out development wise with [2/5] completing [1/5] with handling multi-level interleaving steps and implementing a crude fallback for the case where SLP groups lanes in a way not compatible with an interleaving scheme (we fail to split SLP into multiple single-lane SLP when the permute isn't supported). [3/5] then goes to handle gaps, with the prerequesite of handling NULL in SLP_TREE_SCALAR_STMTS done (pushed this morning) it's quite trivial to add. [4/5] adds support for group-size of three on the load side [5/5] adds support for group-size of three (and power-of-two multiples of that) on the store side. I've bootstrapped and tested the series (and individual steps) on x86_64-unknown-linux-gnu. I've also built SPEC CPU 2017 with [4/5]. As far as merging goes I will start with [5/5] which is quite independent, I'll squash the rest and I'm yet undecided on how and whether to populate the bst_map (I think the SLP pattern matching expects the nodes to be there). With the change we will still have load permutations in some cases. Code generation with SLP is sometimes worse than without, but that happens only when SLP discovery has some multi-lane sub-graphs, for the all-single-lane case it should be good. Comments welcome, esp. which parts (permutes) are unlikely going to work for VLA so I can put comments in the code. Thanks, Richard.