On 12/09/2011 10:02 AM, Ramana Radhakrishnan wrote:
> For Neon a further optimization to consider might be to use the vext
> instruction which could achieve permute masks that are monotonically
> increasing constants ? While I expect the latency for a vext or vtbl
> instruction to be about the same (your mileage might vary depending on
> the core), using vext gives us the freedom of not needing a register
> for the permute mask -
> 
> a = vec_shuffle (b, c, mask) where mask is { n + 7, n + 6, n + 5, n +
> 4, n + 3, n + 2, n + 1, n } could just be vext.8 A, B, C, #n

Good to know.  I missed that one in my reading of the manual.

> Additionally , can we also detect rotate rights ? unless ofcourse
> there's a different interface -
> 
>    a = vec_shuffle (vec, {0, 7, 6, 5, 4, 3, 2, 1}) => vext.8 a, vec, vec, #1

Certainly we can.


r~

Reply via email to