Quoting Richard Henderson <r...@redhat.com>:
Truly variable permutation is something that's only supported by ppc
and spu.
SH64 also has variable permutation. 16 bit elements within its 64 bit
vector size can be permuted.
HOWEVER! Most of the useful permutations that I can think of for the
optimizers to generate are actually constant. And these can be
implemented everywhere (with varying degrees of efficiency).
Variable permutations could be very useful for doing vector operations on
unaligned inputs. To some degrees shifts can be used, but if they only have
C semantics you'll get corner cases with word-sized shifts when the
input is actually aligned.
Anyway, I'm thinking that it might be better to add such a general
operation instead of continuing to add things like
VEC_EXTRACT_EVEN_EXPR,
VEC_EXTRACT_ODD_EXPR,
VEC_INTERLEAVE_HIGH_EXPR,
VEC_INTERLEAVE_LOW_EXPR,
and other obvious patterns like broadcast, duplicate even to odd,
duplicate odd to even, etc.
ARC mxp has a vector exchange operation that would likely not fit into
whatever scheme you are thinking of ... It is very useful for matrix
transposition, except that without a target hook to transpose a matrix,
it is not likely to be generated.
I can imagine having some sort of target hook that computed a cost
metric for a given constant permutation pattern. For instance, I'd
imagine that the interleave patterns are half as expensive as a full
permute for altivec, due to not having to load a mask. This hook would
be fairly complicated for x86, given all of the permuting insns that
were incrementally added in various ISA revisions, but such is life.
There should be some way to account for the difference between the cost
in straight-line code, where a mask load is a hard cost, a large loop,
where the load can be hoisted at the cost of some target-dependent
register pressure (e.g. being able to use inverted masks might save half
of the cost), and a tight loop, where the constant load can be easily
amortized over the entire loop.