On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote: > On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan > <ramana.radhakrish...@linaro.org> wrote: > >> > >> I guess people will complain soon enough if this causes horrible > >> performance > >> regressions in vectorized code. > > > > Not having looked at your patch in great detail,. surely what we don't > > want is a situation where 2 constant permutations are converted into > > one generic permute. Based on a quick read of your patch I couldn't > > work that out. It might be that 2 constant permutes are cheaper than > > a generic permute. Have you looked at any examples in that space . I > > surely wouldn't like to see a sequence of interleave / transpose > > change into a generic permute operation on Neon as that would be far > > more expensive than this. It surely needs more testting than just > > this bit before going in. The reason being that this would likely take > > more registers and indeed produce loads of a constant pool for the new > > mask. > > The patch does not do that. It merely assumes that the target knows > how to perform an optimal constant permute and that two constant > permutes never generate better code than a single one.
Still, the patch should do some tests whether it is beneficial. At least a can_vec_perm_p (mode, false, sel) test of the resulting permutation if both the original permutations pass that test, and perhaps additionally if targetm.vectorize.vec_perm_const_ok is non-NULL and passes for both the original permutations then it should also pass for the resulting permutation. Jakub