Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

Christophe Lyon Wed, 03 Jan 2018 01:55:57 -0800

On 19 December 2017 at 00:36, Jeff Law <l...@redhat.com> wrote:
> On 12/11/2017 08:44 AM, James Greenhalgh wrote:
>> Hi,
>>
>> In the testcase in this patch we create an SLP vector with only two
>> elements. Our current vector initialisation code will first duplicate
>> the first element to both lanes, then overwrite the top lane with a new
>> value.
>>
>> This duplication can be clunky and wasteful.
>>
>> Better would be to simply use the fact that we will always be overwriting
>> the remaining bits, and simply move the first element to the corrcet place
>> (implicitly zeroing all other bits).
>>
>> This reduces the code generation for this case, and can allow more
>> efficient addressing modes, and other second order benefits for AArch64
>> code which has been vectorized to V2DI mode.
>>
>> Note that the change is generic enough to catch the case for any vector
>> mode, but is expected to be most useful for 2x64-bit vectorization.
>>
>> Unfortunately, on its own, this would cause failures in
>> gcc.target/aarch64/load_v2vec_lanes_1.c and
>> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
>> vec_merge and vec_duplicate for their simplifications to apply. To fix this,
>> add a special case to the AArch64 code if we are loading from two memory
>> addresses, and use the load_pair_lanes patterns directly.
>>
>> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to
>> catch:
>>
>>   (vec_merge:OUTER
>>      (vec_duplicate:OUTER x:INNER)
>>      (subreg:OUTER y:INNER 0)
>>      (const_int N))
>>
>> And simplify it to:
>>
>>   (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
>>
>> This is similar to the existing patterns which are tested in this function,
>> without requiring the second operand to also be a vec_duplicate.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
>> aarch64-none-elf.
>>
>> Note that this requires 
>> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
>> if we don't want to ICE creating broken vector zero extends.
>>
>> Are the non-AArch64 parts OK?
>>
>> Thanks,
>> James
>>
>> ---
>> 2017-12-11  James Greenhalgh  <james.greenha...@arm.com>
>>
>>       * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
>>       generation for cases where splatting a value is not useful.
>>       * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
>>       across a vec_duplicate and a paradoxical subreg forming a vector
>>       mode to a vec_concat.
>>
>> 2017-12-11  James Greenhalgh  <james.greenha...@arm.com>
>>
>>       * gcc.target/aarch64/vect-slp-dup.c: New.
>>
>>
>> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch
>>
>>
>
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index 806c309..ed16f70 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, 
>> machine_mode mode,
>>               return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
>>           }
>>
>> +       /* Replace:
>> +
>> +           (vec_merge:outer (vec_duplicate:outer x:inner)
>> +                            (subreg:outer y:inner 0)
>> +                            (const_int N))
>> +
>> +          with (vec_concat:outer x:inner y:inner) if N == 1,
>> +          or (vec_concat:outer y:inner x:inner) if N == 2.
>> +
>> +          Implicitly, this means we have a paradoxical subreg, but such
>> +          a check is cheap, so make it anyway.
> I'm going to assume that N == 1 and N == 3 are handled elsewhere and do
> not show up here in practice.
>
>
> So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG
> show up in the opposite order?  Or is there some canonicalization that
> prevents that?
>
> simplify-rtx bits are OK as-is if we're certain we're not going to get
> the alternate ordering of the VEC_MERGE operands.  ALso OK if you either
> generalize this chunk of code or duplicate & twiddle it to handle the
> alternate order.
>
> I didn't look at the aarch64 specific bits.
>


Hi James,

Your patch (r255946) introduced regressions on aarch64_be:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255946/report-build-info.html

I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83663
to track this.

Thanks,

Christophe

> jeff
>

Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

Reply via email to