Hi maintainers,

I agree with David that duplicating this code is a bad approach.  What
he and I would both prefer is to add a target hook to account for an
anomaly in the PowerPC architecture.

Background: For historical reasons, our vperm instruction (which is
produced for gen_vec_perm) has some peculiar semantics in little endian
mode.  The permute control vector is interpreted to contain elements
indexed in big-endian order no matter which endian mode the processor is
set to.  We can work around this in little endian mode by inverting the
permute control vector and swapping the order of the other two input
vectors, thus obtaining the same semantics as we would get for big
endian.

This behavior works when the two vectors are being treated as a single
double-wide vector; the swapping is needed to make the long array appear
as [8, 7, 6, 5, 4, 3, 2, 1] instead of [4, 3, 2, 1, 8, 7, 6, 5].

In the specific case of expand_mult_highpart (), the two vectors are not
a single double-wide vector, but instead contain the odd and even
results of widening multiplies.  In this case, we still need to invert
the permute control vector, but we don't want to swap the operands,
because that causes us to mix up the odd and even results.  This is the
only such case we've run into where the swap is not what we need to
obtain the right semantics.

What we would like to do is swap the operands an extra time in
expand_mult_highpart (), so that our common code will then swap the
operands back to their original position.  But since this is in
target-independent code, we would need a target hook to do this.
Something like:

  if (targetm.swap_vperm_inputs ())                                             
    {                                                                           
      rtx tmp = m1;                                                             
      m1 = m2;                                                                  
      m2 = tmp;                                                                 
    }                                                                           

For PowerPC, the target hook would return !BYTES_BIG_ENDIAN.  The
default implementation for all other targets would return false.

Would you find such an approach tolerable?

Thanks,
Bill
--- Begin Message ---
On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> Hi,
>
> When working around the peculiar little-endian semantics of the vperm
> instruction, our usual fix is to complement the permute control vector
> and swap the order of the two vector input operands, so that we get a
> double-wide vector in the proper order.  We don't want to swap the
> operands when we are expanding a mult_highpart operation, however, as
> the two input operands are not to be interpreted as a double-wide
> vector.  Instead they represent odd and even elements, and swapping the
> operands gets the odd and even elements reversed in the final result.
>
> The permute for this case is generated by target-neutral code in
> optabs.c: expand_mult_highpart ().  We obviously can't change that code
> directly.  However, we can redirect the logic from the "case 2" method
> to target-specific code by implementing expansions for the
> umul<mode>3_highpart and smul<mode>3_highpart operations.  I've done
> this, with the expansions acting exactly as expand_mult_highpart does
> today, with the exception that it swaps the input operands to the call
> to expand_vec_perm when we are generating little-endian code.  We will
> later swap them back to their original position in the code in rs6000.c:
> altivec_expand_vec_perm_const_le ().
>
> The change has no intended effect when generating big-endian code.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
> regressions.  This fixes the gcc.dg/vect/pr51581-4.c test failure for
> little endian.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> 2013-10-30  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
>         prototype.
>         * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
>         * config/rs6000/altivec.md (umul<mode>3_highpart): New.
>         (smul_<mode>3_highpart): New.

I really do not like duplicating this code.  I think that you need to
explore with the community the possibility of including a hook in the
general code to handle the strangeness of PPC LE vector semantics.

This is asking for problems if the generic code is updated / modified / fixed.

- David


--- End Message ---

Reply via email to