Re: [Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]

Bill Schmidt Fri, 01 Nov 2013 12:42:49 -0700

After discussing this for Richard S at some length today, I want to
withdraw this for now and re-examine the issue.  I don't feel I
understand this as well as I thought I did... ;)


Thanks,
Bill

On Thu, 2013-10-31 at 21:06 -0500, Bill Schmidt wrote:
> Hi maintainers,
> 
> I agree with David that duplicating this code is a bad approach.  What
> he and I would both prefer is to add a target hook to account for an
> anomaly in the PowerPC architecture.
> 
> Background: For historical reasons, our vperm instruction (which is
> produced for gen_vec_perm) has some peculiar semantics in little endian
> mode.  The permute control vector is interpreted to contain elements
> indexed in big-endian order no matter which endian mode the processor is
> set to.  We can work around this in little endian mode by inverting the
> permute control vector and swapping the order of the other two input
> vectors, thus obtaining the same semantics as we would get for big
> endian.
> 
> This behavior works when the two vectors are being treated as a single
> double-wide vector; the swapping is needed to make the long array appear
> as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5].
> 
> In the specific case of expand_mult_highpart (), the two vectors are not
> a single double-wide vector, but instead contain the odd and even
> results of widening multiplies.  In this case, we still need to invert
> the permute control vector, but we don't want to swap the operands,
> because that causes us to mix up the odd and even results.  This is the
> only such case we've run into where the swap is not what we need to
> obtain the right semantics.
> 
> What we would like to do is swap the operands an extra time in
> expand_mult_highpart (), so that our common code will then swap the
> operands back to their original position.  But since this is in
> target-independent code, we would need a target hook to do this.
> Something like:
> 
>   if (targetm.swap_vperm_inputs ())                                           
>   
>     {                                                                         
>   
>       rtx tmp = m1;                                                           
>   
>       m1 = m2;                                                                
>   
>       m2 = tmp;                                                               
>   
>     }                                                                         
>   
> 
> For PowerPC, the target hook would return !BYTES_BIG_ENDIAN.  The
> default implementation for all other targets would return false.
> 
> Would you find such an approach tolerable?
> 
> Thanks,
> Bill
> email message attachment (Re: [PATCH, rs6000] Correct handling of
> multiply high-part for little endian), "Forwarded message - Re:
> [PATCH, rs6000] Correct handling of multiply high-part for little
> endian"
> > -------- Forwarded Message --------
> > From: David Edelsohn <dje....@gmail.com>
> > To: Bill Schmidt <wschm...@linux.vnet.ibm.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH, rs6000] Correct handling of multiply high-part
> > for little endian
> > Date: Wed, 30 Oct 2013 20:06:37 -0400
> > 
> > On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt
> > <wschm...@linux.vnet.ibm.com> wrote:
> > > Hi,
> > >
> > > When working around the peculiar little-endian semantics of the vperm
> > > instruction, our usual fix is to complement the permute control vector
> > > and swap the order of the two vector input operands, so that we get a
> > > double-wide vector in the proper order.  We don't want to swap the
> > > operands when we are expanding a mult_highpart operation, however, as
> > > the two input operands are not to be interpreted as a double-wide
> > > vector.  Instead they represent odd and even elements, and swapping the
> > > operands gets the odd and even elements reversed in the final result.
> > >
> > > The permute for this case is generated by target-neutral code in
> > > optabs.c: expand_mult_highpart ().  We obviously can't change that code
> > > directly.  However, we can redirect the logic from the "case 2" method
> > > to target-specific code by implementing expansions for the
> > > umul<mode>3_highpart and smul<mode>3_highpart operations.  I've done
> > > this, with the expansions acting exactly as expand_mult_highpart does
> > > today, with the exception that it swaps the input operands to the call
> > > to expand_vec_perm when we are generating little-endian code.  We will
> > > later swap them back to their original position in the code in rs6000.c:
> > > altivec_expand_vec_perm_const_le ().
> > >
> > > The change has no intended effect when generating big-endian code.
> > >
> > > Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
> > > regressions.  This fixes the gcc.dg/vect/pr51581-4.c test failure for
> > > little endian.  Ok for trunk?
> > >
> > > Thanks,
> > > Bill
> > >
> > >
> > > 2013-10-30  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> > >
> > >         * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
> > >         prototype.
> > >         * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
> > >         * config/rs6000/altivec.md (umul<mode>3_highpart): New.
> > >         (smul_<mode>3_highpart): New.
> > 
> > I really do not like duplicating this code.  I think that you need to
> > explore with the community the possibility of including a hook in the
> > general code to handle the strangeness of PPC LE vector semantics.
> > 
> > This is asking for problems if the generic code is updated / modified / 
> > fixed.
> > 
> > - David
> >

Re: [Fwd: Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian]

Reply via email to