http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607

--- Comment #29 from Marc Glisse <marc.glisse at normalesup dot org> 2012-04-11 
20:35:00 UTC ---
Created attachment 27136
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27136
V4DF generic shuffle

A patch (independent from the others) implementing what is explained in the
last 2 comments. It is simple and works really well, all V4DF shuffles (even
with 2 vectors) take only 3 insn (and often just 2). It only requires AVX, but
also improves a lot on the current AVX2 code which casts to vectors of integers
and uses up to 9 insn (although my "default case" patch also goes down to 3
insn on AVX2).

The drawback is that it is limited to V4DF. vshufps is a different enough beast
from vshufpd that it would require a different code, which wouldn't even apply
that often. For V8SF, my "default case" patch seems more interesting. Integer
vectors have different instructions again...

By the way, I tested all V4DF permutations (there are only 2^12 of them) in the
simulator. I also have a file (400K) with the code for each permutation, that
looks like the following:
0,0,0,0
    vpermilpd    $0, %ymm0, %ymm0
    vperm2f128    $0, %ymm0, %ymm0, %ymm0
[...]
1,7,6,3
        vperm2f128      $48, %ymm1, %ymm0, %ymm2
        vperm2f128      $19, %ymm1, %ymm0, %ymm0
        vshufpd $11, %ymm0, %ymm2, %ymm0
1,7,6,4
        vperm2f128      $48, %ymm1, %ymm0, %ymm0
        vperm2f128      $33, %ymm1, %ymm1, %ymm1
        vshufpd $3, %ymm1, %ymm0, %ymm0
[...]
If anyone wants to take a look, tell me and I'll attach it.

Reply via email to