On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner <meiss...@linux.vnet.ibm.com> wrote: > This patches optimizes the PowerPC vector set operation for 64-bit doubles and > longs where the elements in the vector set may have been extracted from > another > vector (PR target/81593): > > Here an an example: > > vector double > test_vpasted (vector double high, vector double low) > { > vector double res; > res[1] = high[1]; > res[0] = low[0]; > return res; > }
Interesting. We expand from <bb 2> [100.00%] [count: INV]: _1 = BIT_FIELD_REF <high_4(D), 64, 64>; res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; _2 = BIT_FIELD_REF <low_7(D), 64, 0>; res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; return res_8; but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap pass looks like the canonical pass for this even though it's quite awkward to fill this in. So a match.pd rule would work as well here - your ppc backend patterns are v2df specific, right? > Previously it would generate: > > xxpermdi 12,34,34,2 > vspltisw 2,0 > xxlor 0,35,35 > xxpermdi 34,34,12,0 > xxpermdi 34,0,34,1 > > and with these patches, it now generates: > > xxpermdi 34,35,34,1 > > I have tested it on a little endian power8 system and a big endian power7 > system with the usual bootstrap and make checks with no regressions. Can I > check this into the trunk? > > I also built Spec 2006 with the compiler, and saw no changes in the code > generated. This isn't surprising because it isn't something that auto > vectorization might generate by default. > > [gcc] > 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> > > PR target/81593 > * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New > declaration. > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to > emit XXPERMDI accessing either double word in either vector > register inputs. > * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): > Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify > the constraints with the removal of the -mupper-regs-* switches. > (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs > where either register might have come from VEC_SELECT. > (vsx_concat_<mode>_2): Likewise. > (vsx_concat_<mode>_3): Likewise. > (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a > VEC_CONCAT rather than use an UNSPEC to specify the option. > > [gcc/testsuite] > 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> > > PR target/81593 > * gcc.target/powerpc/vsx-extract-6.c: New test. > * gcc.target/powerpc/vsx-extract-7.c: Likewise. > > -- > Michael Meissner, IBM > IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA > email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797