Hello, I am super confuse now
scenario 1, what I have in m code: machine boots in LE. 1) memory: LE 2) I load (ld_vec) 3) register : LE 4) VSU compute in LE 5) I store (st_vec) 6) memory: LE scenario 2: ( I did not test but it is what I get if I order gcc to compiler in BE) machine boot in BE 1) memory: BE 2) I load (ld_vsx_vec) 3) register : BE 4) VSU compute in BE 5) I store (st_vsx_vec) 6) memory: BE At this point the VUS compute in both order chimera scenario 3, what I understand: machine boot in LE 1) memory: LE 2) I load (ld_vsx_vec) (the load swap the element) 3) register : BE 4) swap : LE 5) VSU compute in LE 6) swap : BE 5) I store (st_vsx_vec) (the store swap the element) 6) memory: BE I understand ld/st_vsx_vec load/store from LE/BE, but as the VXU can compute in both mode what should I swap (I precise I am working with 32/64 bits float) Best, Tim Timothée Ewart, Ph. D. http://www.linkedin.com/in/tewart timothee.ew...@epfl.ch > Le 13 Mar 2015 à 17:50, Bill Schmidt <wschm...@linux.vnet.ibm.com> a écrit : > > Hi Tim, > > Actually, I left out another very good reason why you may want to use > vec_vsx_ld/st. Sorry for forgetting this. > > As you saw, vec_ld translates into the lvx instruction. This > instruction loads a sequence of 16 bytes into a vector register. For > big endian, the first byte in memory is loaded into the high order byte > of the register. For little endian, the first byte in memory is loaded > into the low order byte of the register. > > This is fine if the data you are loading is arrays of characters, but is > not so fine if you are loading arrays of larger items. Suppose you are > loading four integers {1, 2, 3, 4} into a register with lvx. In big > endian you will see: > > 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04 > > In little endian you will see: > > 04 00 00 00 03 00 00 00 02 00 00 00 01 00 00 00 > > But for this to be interpreted as a vector of integers ordered for > little endian, what you really want is: > > 00 00 00 04 00 00 00 03 00 00 00 02 00 00 00 01 > > If you use vec_vsx_ld, the compiler will generate a lxvw2x instruction > followed by an xxpermdi that swaps the doublewords. After the lxvw2x > you will have: > > 00 00 00 02 00 00 00 01 00 00 00 04 00 00 00 03 > > because the two LE doublewords are loaded in BE (reversed) order. > Swapping the two doublewords restores sanity: > > 00 00 00 04 00 00 00 03 00 00 00 02 00 00 00 01 > > So, even if your data is properly aligned, the use of vec_ld = lvx is > only correct if you are loading arrays of bytes. Arrays of anything > larger must use vec_vsx_ld to avoid errors. > > Again, sorry for my previous omission! > > Thanks, > > Bill Schmidt, Ph.D. > IBM Linux Technology Center > > On Fri, 2015-03-13 at 15:42 +0000, Ewart Timothée wrote: >> thank you very much for this answer. >> I know my memory is aligned so I will use vec_ld/st only. >> >> best >> >> Tim >> >> >> >> >> > >