Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

Richard Biener Fri, 28 Jul 2017 01:21:14 -0700

On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pins...@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>> <meiss...@linux.vnet.ibm.com> wrote:
>>> This patches optimizes the PowerPC vector set operation for 64-bit doubles 
>>> and
>>> longs where the elements in the vector set may have been extracted from 
>>> another
>>> vector (PR target/81593):
>>>
>>> Here an an example:
>>>
>>>         vector double
>>>         test_vpasted (vector double high, vector double low)
>>>         {
>>>           vector double res;
>>>           res[1] = high[1];
>>>           res[0] = low[0];
>>>           return res;
>>>         }
>>
>> Interesting.  We expand from
>>
>>   <bb 2> [100.00%] [count: INV]:
>>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>>   return res_8;
>>
>> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
>> pass looks like the canonical pass for this even though it's quite awkward
>> to fill this in.
>
> I was thinking about this exactly.  Though for the scale use of
> BIT_INSERT_EXPR/BIT_FIELD_REF.
> I have a case where someone writes (this shows up in GCC too):
> a->b = c->b;
> a->d = c->d;
> a->e = c->e;
> a->f = c->f;
> a->g = c->g;
> a->h = c->h;
>
> Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I 
> have:
> _1 = BIT_FIELD_REF <a.1_3, 2, 0>;
> _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
> _2 = BIT_FIELD_REF <a.1_3, 4, 2>;
> _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
> ....
>
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
> The reason why I ask that is because say someone who wrote:
> vector double
> test_vpasted (vector double high, vector double low)
> {
>   vector double res = { high[1], low[0] };
>   return res;
> }


I still believe a proper pass is better than match.pd patterns (which are
awkward when dealing with "variable operand number" cases).

I believe in the end we want to "unify" SRA, bswap and store-merging
at least.  Analyze memory/component accesses, their flow and then
pattern-match the result.  bswap is good with the flow stuff but its
memory/component access analysis is too ad-hoc.

"unify" in the sense of using common infrastructure.

Richard.

>
> Thanks,
> Andrew Pinski
>
>>
>> So a match.pd rule would work as well here - your ppc backend patterns
>> are v2df specific, right?
>>
>>> Previously it would generate:
>>>
>>>         xxpermdi 12,34,34,2
>>>         vspltisw 2,0
>>>         xxlor 0,35,35
>>>         xxpermdi 34,34,12,0
>>>         xxpermdi 34,0,34,1
>>>
>>> and with these patches, it now generates:
>>>
>>>         xxpermdi 34,35,34,1
>>>
>>> I have tested it on a little endian power8 system and a big endian power7
>>> system with the usual bootstrap and make checks with no regressions.  Can I
>>> check this into the trunk?
>>>
>>> I also built Spec 2006 with the compiler, and saw no changes in the code
>>> generated.  This isn't surprising because it isn't something that auto
>>> vectorization might generate by default.
>>>
>>> [gcc]
>>> 2017-07-27  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>>         declaration.
>>>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>>         emit XXPERMDI accessing either double word in either vector
>>>         register inputs.
>>>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>>         the constraints with the removal of the -mupper-regs-* switches.
>>>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>>         where either register might have come from VEC_SELECT.
>>>         (vsx_concat_<mode>_2): Likewise.
>>>         (vsx_concat_<mode>_3): Likewise.
>>>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>>>
>>> [gcc/testsuite]
>>> 2017-07-27  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>>>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>>
>>> --
>>> Michael Meissner, IBM
>>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>>> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

Reply via email to