Hello all,

I have a issue/question using VMX/VSX on Power8 processor on a little endian 
system.
Using intrinsics function, if I perform an operation with vec_vsx_ld(…) - 
vet_vsx_st(), the compiler will add
a permutation, and then perform an operations (memory correctly aligned)

lxvd2x …
xxpermdi …
operations ….
xxpermdi
stxvd2x …

If I use vec_ld() - vec_st()

lvx
operations …
stvx

Reading the ISA, I do not see a real difference between this 2 instructions ( 
or I miss it)

So my 3 questions are:
 
Why do I have permutations ?
What is the cost of these permutations ?
What is the difference vet_vsx_ld and vec_ld  for the performance ?


Best

Tim



Timothée Ewart, Ph. D. 
http://www.linkedin.com/in/tewart
timothee.ew...@epfl.ch






Reply via email to