On Thu, Jan 30, 2014 at 9:42 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > This patch adds logic for -maltivec=be with a little endian target when > generating code for the vec_sums builtin. This implements the vsumsws > instruction, which adds the four elements in the first input vector > operand to element 3 of the second input vector operand, placing the > result in element 3 of the destination vector operand. > > For little endian, element 3 is the leftmost (most significant) word in > the vector register, while the instruction treats element 3 as the > rightmost (least significant) word. Since there is not a vector > shift-immediate or rotate-immediate instruction in VMX, we use a splat > instruction to get LE element 3 (BE element 0) into BE element 3 of a > scratch register for input to the vsumsws instruction. Similarly, the > result of the vsumsws instruction is then splatted from BE element 3 > into BE element 0 (LE element 3) where it is expected to be by any > builtin that consumes that value. The destination register is reused > for this purpose. > > As with other patches in this series, an altivec_vsumsws_direct pattern > is added for uses of vsumsws internal to GCC. > > Two new test cases are added that demonstrate how the vec_vsums builtin > is expected to behave for BE, LE, and LE with -maltivec=be. > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no > regressions. Is this ok for trunk? > > Thanks, > Bill > > > gcc: > > 2014-01-30 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (UNSPEC_VSUMSWS_DIRECT): New unspec. > (altivec_vsumsws): Add handling for -maltivec=be with a little > endian target. > (altivec_vsumsws_direct): New. > (reduc_splus_<mode>): Call gen_altivec_vsumsws_direct instead of > gen_altivec_vsumsws. > > gcc/testsuite: > > 2014-01-30 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * gcc.dg/vmx/vsums.c: New. > * gcc.dg/vmx/vsums-be-order.c: New.
Okay. Thanks, David