On 12/09/2011 10:02 AM, Ramana Radhakrishnan wrote: > For Neon a further optimization to consider might be to use the vext > instruction which could achieve permute masks that are monotonically > increasing constants ? While I expect the latency for a vext or vtbl > instruction to be about the same (your mileage might vary depending on > the core), using vext gives us the freedom of not needing a register > for the permute mask - > > a = vec_shuffle (b, c, mask) where mask is { n + 7, n + 6, n + 5, n + > 4, n + 3, n + 2, n + 1, n } could just be vext.8 A, B, C, #n
Good to know. I missed that one in my reading of the manual. > Additionally , can we also detect rotate rights ? unless ofcourse > there's a different interface - > > a = vec_shuffle (vec, {0, 7, 6, 5, 4, 3, 2, 1}) => vext.8 a, vec, vec, #1 Certainly we can. r~