On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pins...@gmail.com> wrote: > On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner >> <meiss...@linux.vnet.ibm.com> wrote: >>> This patches optimizes the PowerPC vector set operation for 64-bit doubles >>> and >>> longs where the elements in the vector set may have been extracted from >>> another >>> vector (PR target/81593): >>> >>> Here an an example: >>> >>> vector double >>> test_vpasted (vector double high, vector double low) >>> { >>> vector double res; >>> res[1] = high[1]; >>> res[0] = low[0]; >>> return res; >>> } >> >> Interesting. We expand from >> >> <bb 2> [100.00%] [count: INV]: >> _1 = BIT_FIELD_REF <high_4(D), 64, 64>; >> res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; >> _2 = BIT_FIELD_REF <low_7(D), 64, 0>; >> res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; >> return res_8; >> >> but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap >> pass looks like the canonical pass for this even though it's quite awkward >> to fill this in. > > I was thinking about this exactly. Though for the scale use of > BIT_INSERT_EXPR/BIT_FIELD_REF. > I have a case where someone writes (this shows up in GCC too): > a->b = c->b; > a->d = c->d; > a->e = c->e; > a->f = c->f; > a->g = c->g; > a->h = c->h; > > Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I > have: > _1 = BIT_FIELD_REF <a.1_3, 2, 0>; > _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>; > _2 = BIT_FIELD_REF <a.1_3, 4, 2>; > _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>; > .... > > For the vector case, can't we write it as: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = {_1, _2}; > > And then have some match.pd patterns (which might get complex), to > rewrite that into VEC_PERM_EXPR? > The reason why I ask that is because say someone who wrote: > vector double > test_vpasted (vector double high, vector double low) > { > vector double res = { high[1], low[0] }; > return res; > }
I still believe a proper pass is better than match.pd patterns (which are awkward when dealing with "variable operand number" cases). I believe in the end we want to "unify" SRA, bswap and store-merging at least. Analyze memory/component accesses, their flow and then pattern-match the result. bswap is good with the flow stuff but its memory/component access analysis is too ad-hoc. "unify" in the sense of using common infrastructure. Richard. > > Thanks, > Andrew Pinski > >> >> So a match.pd rule would work as well here - your ppc backend patterns >> are v2df specific, right? >> >>> Previously it would generate: >>> >>> xxpermdi 12,34,34,2 >>> vspltisw 2,0 >>> xxlor 0,35,35 >>> xxpermdi 34,34,12,0 >>> xxpermdi 34,0,34,1 >>> >>> and with these patches, it now generates: >>> >>> xxpermdi 34,35,34,1 >>> >>> I have tested it on a little endian power8 system and a big endian power7 >>> system with the usual bootstrap and make checks with no regressions. Can I >>> check this into the trunk? >>> >>> I also built Spec 2006 with the compiler, and saw no changes in the code >>> generated. This isn't surprising because it isn't something that auto >>> vectorization might generate by default. >>> >>> [gcc] >>> 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> >>> >>> PR target/81593 >>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New >>> declaration. >>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to >>> emit XXPERMDI accessing either double word in either vector >>> register inputs. >>> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): >>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify >>> the constraints with the removal of the -mupper-regs-* switches. >>> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs >>> where either register might have come from VEC_SELECT. >>> (vsx_concat_<mode>_2): Likewise. >>> (vsx_concat_<mode>_3): Likewise. >>> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a >>> VEC_CONCAT rather than use an UNSPEC to specify the option. >>> >>> [gcc/testsuite] >>> 2017-07-27 Michael Meissner <meiss...@linux.vnet.ibm.com> >>> >>> PR target/81593 >>> * gcc.target/powerpc/vsx-extract-6.c: New test. >>> * gcc.target/powerpc/vsx-extract-7.c: Likewise. >>> >>> -- >>> Michael Meissner, IBM >>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA >>> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797