On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
<meiss...@linux.vnet.ibm.com> wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from 
> another
> vector (PR target/81593):
>
> Here an an example:
>
>         vector double
>         test_vpasted (vector double high, vector double low)
>         {
>           vector double res;
>           res[1] = high[1];
>           res[0] = low[0];
>           return res;
>         }

Interesting.  We expand from

  <bb 2> [100.00%] [count: INV]:
  _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
  res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
  _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
  res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
  return res_8;

but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
pass looks like the canonical pass for this even though it's quite awkward
to fill this in.

So a match.pd rule would work as well here - your ppc backend patterns
are v2df specific, right?

> Previously it would generate:
>
>         xxpermdi 12,34,34,2
>         vspltisw 2,0
>         xxlor 0,35,35
>         xxpermdi 34,34,12,0
>         xxpermdi 34,0,34,1
>
> and with these patches, it now generates:
>
>         xxpermdi 34,35,34,1
>
> I have tested it on a little endian power8 system and a big endian power7
> system with the usual bootstrap and make checks with no regressions.  Can I
> check this into the trunk?
>
> I also built Spec 2006 with the compiler, and saw no changes in the code
> generated.  This isn't surprising because it isn't something that auto
> vectorization might generate by default.
>
> [gcc]
> 2017-07-27  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>
>         PR target/81593
>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>         declaration.
>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>         emit XXPERMDI accessing either double word in either vector
>         register inputs.
>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>         the constraints with the removal of the -mupper-regs-* switches.
>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>         where either register might have come from VEC_SELECT.
>         (vsx_concat_<mode>_2): Likewise.
>         (vsx_concat_<mode>_3): Likewise.
>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>
> [gcc/testsuite]
> 2017-07-27  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>
>         PR target/81593
>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Reply via email to