On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
<ramana.radhakrish...@linaro.org> wrote:
>>
>> I guess people will complain soon enough if this causes horrible performance
>> regressions in vectorized code.
>
> Not having looked at your patch in great detail,. surely what we don't
> want is a situation where 2 constant permutations are converted into
> one generic permute. Based on a quick read of your patch I couldn't
> work that out.  It might be that 2 constant  permutes are cheaper than
> a generic permute. Have you looked at any examples in that space . I
> surely wouldn't like to see a sequence of interleave / transpose
> change into a generic permute operation on Neon as that would be far
> more expensive than this.  It surely needs more testting than just
> this bit before going in. The reason being that this would likely take
> more registers and indeed produce loads of a constant pool for the new
> mask.

The patch does not do that.  It merely assumes that the target knows
how to perform an optimal constant permute and that two constant
permutes never generate better code than a single one.

Richard.

> regards,
> Ramana

Reply via email to