Re: vec_ld versus vec_vsx_ld on power8

Ewart Timothée Fri, 13 Mar 2015 10:12:48 -0700

Hello,

I am super confuse now


scenario 1, what I have in m code:
machine boots in LE.

1) memory: LE
2) I load (ld_vec)
3) register : LE
4) VSU compute in LE
5) I store (st_vec)
6) memory: LE

scenario 2: ( I did not test but it is what I get if I order gcc to compiler in 
BE)
machine boot in BE

1) memory: BE
2) I load (ld_vsx_vec)
3) register : BE
4) VSU compute in BE 
5) I store (st_vsx_vec)
6) memory: BE

At this point the VUS compute in both order

chimera scenario 3, what I understand:

machine boot in LE

1) memory: LE
2) I load (ld_vsx_vec)  (the load swap the element)
3) register : BE
4) swap : LE
5) VSU compute in LE
6) swap : BE 
5) I store (st_vsx_vec) (the store swap the element)
6) memory: BE

I understand  ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute
in both mode what should I swap (I precise I am working with 32/64 bits float)

Best,

Tim

Timothée Ewart, Ph. D. 
http://www.linkedin.com/in/tewart
timothee.ew...@epfl.ch






> Le 13 Mar 2015 à 17:50, Bill Schmidt <wschm...@linux.vnet.ibm.com> a écrit :
> 
> Hi Tim,
> 
> Actually, I left out another very good reason why you may want to use
> vec_vsx_ld/st.  Sorry for forgetting this.
> 
> As you saw, vec_ld translates into the lvx instruction.  This
> instruction loads a sequence of 16 bytes into a vector register.  For
> big endian, the first byte in memory is loaded into the high order byte
> of the register.  For little endian, the first byte in memory is loaded
> into the low order byte of the register.
> 
> This is fine if the data you are loading is arrays of characters, but is
> not so fine if you are loading arrays of larger items.  Suppose you are
> loading four integers {1, 2, 3, 4} into a register with lvx.  In big
> endian you will see:
> 
>  00 00 00 01  00 00 00 02  00 00 00 03  00 00 00 04
> 
> In little endian you will see:
> 
>  04 00 00 00  03 00 00 00  02 00 00 00  01 00 00 00
> 
> But for this to be interpreted as a vector of integers ordered for
> little endian, what you really want is:
> 
>  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> 
> If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
> followed by an xxpermdi that swaps the doublewords.  After the lxvw2x
> you will have:
> 
>  00 00 00 02  00 00 00 01  00 00 00 04  00 00 00 03
> 
> because the two LE doublewords are loaded in BE (reversed) order.
> Swapping the two doublewords restores sanity:
> 
>  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> 
> So, even if your data is properly aligned, the use of vec_ld = lvx is
> only correct if you are loading arrays of bytes.  Arrays of anything
> larger must use vec_vsx_ld to avoid errors.
> 
> Again, sorry for my previous omission!
> 
> Thanks,
> 
> Bill Schmidt, Ph.D.
> IBM Linux Technology Center
> 
> On Fri, 2015-03-13 at 15:42 +0000, Ewart Timothée wrote:
>> thank you very much for this answer.
>> I know my memory is aligned so I will use vec_ld/st only.
>> 
>> best
>> 
>> Tim
>> 
>> 
>> 
>> 
>> 
> 
>

Re: vec_ld versus vec_vsx_ld on power8

Reply via email to