Re: vec_ld versus vec_vsx_ld on power8

Bill Schmidt Fri, 13 Mar 2015 10:30:43 -0700

Hi Tim,

Sorry to have confused you.  This stuff is a bit boggling the first 200
times you look at it...


For both 32-bit and 64-bit floating-point, you should use ld_vsx_vec on
both BE and LE machines, and the compiler will take care of doing the
right thing for you in both cases.  You do not have to add any swaps
yourself.

When compiling for big-endian, ld_vsx_vec will translate into either
lxvw4x (for 32-bit floating-point) or lxvd2x (for 64-bit
floating-point).  The values will be loaded into the register from
left-to-right (BE ordering).

When compiling for little-endian, ld_vsx_vec will translate into lxvd2x
followed by xxpermdi for both 32-bit and 64-bit floating-point.  This
does the right thing in both cases.  The values will be loaded into the
register from right-to-left (LE ordering).

The vector programming model is set up to allow you to usually code the
same way for both BE and LE.  This is discussed more in Chapter 6 of the
ELFv2 ABI manual, which can be obtained from the OpenPOWER Connect
website (free registration required):

https://www-03.ibm.com/technologyconnect/tgcm/TGCMServlet.wss?alias=OpenPOWER&linkid=1n0000

Bill


On Fri, 2015-03-13 at 17:11 +0000, Ewart Timothée wrote:
> Hello,
> 
> I am super confuse now
> 
> scenario 1, what I have in m code:
> machine boots in LE.
> 
> 1) memory: LE
> 2) I load (ld_vec)
> 3) register : LE
> 4) VSU compute in LE
> 5) I store (st_vec)
> 6) memory: LE
> 
> scenario 2: ( I did not test but it is what I get if I order gcc to compiler 
> in BE)
> machine boot in BE
> 
> 1) memory: BE
> 2) I load (ld_vsx_vec)
> 3) register : BE
> 4) VSU compute in BE 
> 5) I store (st_vsx_vec)
> 6) memory: BE
> 
> At this point the VUS compute in both order
> 
> chimera scenario 3, what I understand:
> 
> machine boot in LE
> 
> 1) memory: LE
> 2) I load (ld_vsx_vec)  (the load swap the element)
> 3) register : BE
> 4) swap : LE
> 5) VSU compute in LE
> 6) swap : BE 
> 5) I store (st_vsx_vec) (the store swap the element)
> 6) memory: BE
> 
> I understand  ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute
> in both mode what should I swap (I precise I am working with 32/64 bits float)
> 
> Best,
> 
> Tim
> 
> Timothée Ewart, Ph. D. 
> http://www.linkedin.com/in/tewart
> [email protected]
> 
> 
> 
> 
> 
> 
> > Le 13 Mar 2015 à 17:50, Bill Schmidt <[email protected]> a écrit :
> > 
> > Hi Tim,
> > 
> > Actually, I left out another very good reason why you may want to use
> > vec_vsx_ld/st.  Sorry for forgetting this.
> > 
> > As you saw, vec_ld translates into the lvx instruction.  This
> > instruction loads a sequence of 16 bytes into a vector register.  For
> > big endian, the first byte in memory is loaded into the high order byte
> > of the register.  For little endian, the first byte in memory is loaded
> > into the low order byte of the register.
> > 
> > This is fine if the data you are loading is arrays of characters, but is
> > not so fine if you are loading arrays of larger items.  Suppose you are
> > loading four integers {1, 2, 3, 4} into a register with lvx.  In big
> > endian you will see:
> > 
> >  00 00 00 01  00 00 00 02  00 00 00 03  00 00 00 04
> > 
> > In little endian you will see:
> > 
> >  04 00 00 00  03 00 00 00  02 00 00 00  01 00 00 00
> > 
> > But for this to be interpreted as a vector of integers ordered for
> > little endian, what you really want is:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction
> > followed by an xxpermdi that swaps the doublewords.  After the lxvw2x
> > you will have:
> > 
> >  00 00 00 02  00 00 00 01  00 00 00 04  00 00 00 03
> > 
> > because the two LE doublewords are loaded in BE (reversed) order.
> > Swapping the two doublewords restores sanity:
> > 
> >  00 00 00 04  00 00 00 03  00 00 00 02  00 00 00 01
> > 
> > So, even if your data is properly aligned, the use of vec_ld = lvx is
> > only correct if you are loading arrays of bytes.  Arrays of anything
> > larger must use vec_vsx_ld to avoid errors.
> > 
> > Again, sorry for my previous omission!
> > 
> > Thanks,
> > 
> > Bill Schmidt, Ph.D.
> > IBM Linux Technology Center
> > 
> > On Fri, 2015-03-13 at 15:42 +0000, Ewart Timothée wrote:
> >> thank you very much for this answer.
> >> I know my memory is aligned so I will use vec_ld/st only.
> >> 
> >> best
> >> 
> >> Tim
> >> 
> >> 
> >> 
> >> 
> >> 
> > 
> > 
>

Re: vec_ld versus vec_vsx_ld on power8

Reply via email to