Hello all, I have a issue/question using VMX/VSX on Power8 processor on a little endian system. Using intrinsics function, if I perform an operation with vec_vsx_ld(…) - vet_vsx_st(), the compiler will add a permutation, and then perform an operations (memory correctly aligned)
lxvd2x … xxpermdi … operations …. xxpermdi stxvd2x … If I use vec_ld() - vec_st() lvx operations … stvx Reading the ISA, I do not see a real difference between this 2 instructions ( or I miss it) So my 3 questions are: Why do I have permutations ? What is the cost of these permutations ? What is the difference vet_vsx_ld and vec_ld for the performance ? Best Tim Timothée Ewart, Ph. D. http://www.linkedin.com/in/tewart timothee.ew...@epfl.ch