On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan <ramana.radhakrish...@linaro.org> wrote: >> >> I guess people will complain soon enough if this causes horrible performance >> regressions in vectorized code. > > Not having looked at your patch in great detail,. surely what we don't > want is a situation where 2 constant permutations are converted into > one generic permute. Based on a quick read of your patch I couldn't > work that out. It might be that 2 constant permutes are cheaper than > a generic permute. Have you looked at any examples in that space . I > surely wouldn't like to see a sequence of interleave / transpose > change into a generic permute operation on Neon as that would be far > more expensive than this. It surely needs more testting than just > this bit before going in. The reason being that this would likely take > more registers and indeed produce loads of a constant pool for the new > mask.
The patch does not do that. It merely assumes that the target knows how to perform an optimal constant permute and that two constant permutes never generate better code than a single one. Richard. > regards, > Ramana