[PATCH 0/5][v2] SLP load and store permute lowering

Richard Biener Wed, 03 Jul 2024 06:23:44 -0700


This completes (functionality wise) the work bringing non-SLP
"interleaving" group vectorization to SLP.  I've chosen to keep
[1/5] exactly the same as posted last time but it might confuse
things as [2/5] largely refactors it.


The series is how it worked out development wise with [2/5]
completing [1/5] with handling multi-level interleaving steps
and implementing a crude fallback for the case where SLP groups
lanes in a way not compatible with an interleaving scheme
(we fail to split SLP into multiple single-lane SLP when the
permute isn't supported).

[3/5] then goes to handle gaps, with the prerequesite of handling
NULL in SLP_TREE_SCALAR_STMTS done (pushed this morning) it's
quite trivial to add.

[4/5] adds support for group-size of three on the load side

[5/5] adds support for group-size of three (and power-of-two multiples of 
that) on the store side.

I've bootstrapped and tested the series (and individual steps)
on x86_64-unknown-linux-gnu.  I've also built SPEC CPU 2017 with [4/5].

As far as merging goes I will start with [5/5] which is quite
independent, I'll squash the rest and I'm yet undecided on how
and whether to populate the bst_map (I think the SLP pattern
matching expects the nodes to be there).

With the change we will still have load permutations in some cases.

Code generation with SLP is sometimes worse than without, but that
happens only when SLP discovery has some multi-lane sub-graphs, for
the all-single-lane case it should be good.

Comments welcome, esp. which parts (permutes) are unlikely going
to work for VLA so I can put comments in the code.

Thanks,
Richard.

[PATCH 0/5][v2] SLP load and store permute lowering

Reply via email to