On Mon, 13 Aug 2012, Richard Guenther wrote:
On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
<ramana.radhakrish...@linaro.org> wrote:
I guess people will complain soon enough if this causes horrible performance
regressions in vectorized code.
Not having looked at your patch in great detail,. surely what we don't
want is a situation where 2 constant permutations are converted into
one generic permute. Based on a quick read of your patch I couldn't
work that out. It might be that 2 constant permutes are cheaper than
a generic permute. Have you looked at any examples in that space . I
surely wouldn't like to see a sequence of interleave / transpose
change into a generic permute operation on Neon as that would be far
more expensive than this. It surely needs more testting than just
this bit before going in. The reason being that this would likely take
more registers and indeed produce loads of a constant pool for the new
mask.
What do you call constant / non-constant? The combined permutation is
still constant, although the expansion (in the back-end) might fail to
expand it efficiently and fall back to the generic permutation
expansion...
The patch does not do that. It merely assumes that the target knows
how to perform an optimal constant permute and that two constant
permutes never generate better code than a single one.
Which, to be honest, is false on all platforms I know, although I did
contribute some minor enhancements for x86.
--
Marc Glisse