After discussing this for Richard S at some length today, I want to withdraw this for now and re-examine the issue. I don't feel I understand this as well as I thought I did... ;)
Thanks, Bill On Thu, 2013-10-31 at 21:06 -0500, Bill Schmidt wrote: > Hi maintainers, > > I agree with David that duplicating this code is a bad approach. What > he and I would both prefer is to add a target hook to account for an > anomaly in the PowerPC architecture. > > Background: For historical reasons, our vperm instruction (which is > produced for gen_vec_perm) has some peculiar semantics in little endian > mode. The permute control vector is interpreted to contain elements > indexed in big-endian order no matter which endian mode the processor is > set to. We can work around this in little endian mode by inverting the > permute control vector and swapping the order of the other two input > vectors, thus obtaining the same semantics as we would get for big > endian. > > This behavior works when the two vectors are being treated as a single > double-wide vector; the swapping is needed to make the long array appear > as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5]. > > In the specific case of expand_mult_highpart (), the two vectors are not > a single double-wide vector, but instead contain the odd and even > results of widening multiplies. In this case, we still need to invert > the permute control vector, but we don't want to swap the operands, > because that causes us to mix up the odd and even results. This is the > only such case we've run into where the swap is not what we need to > obtain the right semantics. > > What we would like to do is swap the operands an extra time in > expand_mult_highpart (), so that our common code will then swap the > operands back to their original position. But since this is in > target-independent code, we would need a target hook to do this. > Something like: > > if (targetm.swap_vperm_inputs ()) > > { > > rtx tmp = m1; > > m1 = m2; > > m2 = tmp; > > } > > > For PowerPC, the target hook would return !BYTES_BIG_ENDIAN. The > default implementation for all other targets would return false. > > Would you find such an approach tolerable? > > Thanks, > Bill > email message attachment (Re: [PATCH, rs6000] Correct handling of > multiply high-part for little endian), "Forwarded message - Re: > [PATCH, rs6000] Correct handling of multiply high-part for little > endian" > > -------- Forwarded Message -------- > > From: David Edelsohn <dje....@gmail.com> > > To: Bill Schmidt <wschm...@linux.vnet.ibm.com> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org> > > Subject: Re: [PATCH, rs6000] Correct handling of multiply high-part > > for little endian > > Date: Wed, 30 Oct 2013 20:06:37 -0400 > > > > On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt > > <wschm...@linux.vnet.ibm.com> wrote: > > > Hi, > > > > > > When working around the peculiar little-endian semantics of the vperm > > > instruction, our usual fix is to complement the permute control vector > > > and swap the order of the two vector input operands, so that we get a > > > double-wide vector in the proper order. We don't want to swap the > > > operands when we are expanding a mult_highpart operation, however, as > > > the two input operands are not to be interpreted as a double-wide > > > vector. Instead they represent odd and even elements, and swapping the > > > operands gets the odd and even elements reversed in the final result. > > > > > > The permute for this case is generated by target-neutral code in > > > optabs.c: expand_mult_highpart (). We obviously can't change that code > > > directly. However, we can redirect the logic from the "case 2" method > > > to target-specific code by implementing expansions for the > > > umul<mode>3_highpart and smul<mode>3_highpart operations. I've done > > > this, with the expansions acting exactly as expand_mult_highpart does > > > today, with the exception that it swaps the input operands to the call > > > to expand_vec_perm when we are generating little-endian code. We will > > > later swap them back to their original position in the code in rs6000.c: > > > altivec_expand_vec_perm_const_le (). > > > > > > The change has no intended effect when generating big-endian code. > > > > > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new > > > regressions. This fixes the gcc.dg/vect/pr51581-4.c test failure for > > > little endian. Ok for trunk? > > > > > > Thanks, > > > Bill > > > > > > > > > 2013-10-30 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > > > > * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New > > > prototype. > > > * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New. > > > * config/rs6000/altivec.md (umul<mode>3_highpart): New. > > > (smul_<mode>3_highpart): New. > > > > I really do not like duplicating this code. I think that you need to > > explore with the community the possibility of including a hook in the > > general code to handle the strangeness of PPC LE vector semantics. > > > > This is asking for problems if the generic code is updated / modified / > > fixed. > > > > - David > >