Richard Sandiford <richard.sandif...@arm.com> writes:
> Tamar Christina <tamar.christ...@arm.com> writes:
>> Hi All,
>>
>> This adds an RTL pattern for when two NARROWB instructions are being combined
>> with a PACK.  The second NARROWB is then transformed into a NARROWT.
>>
>> For the example:
>>
>> void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
>> {
>>   for (int i = 0; i < (n & -16); i+=1)
>>     pixel[i] += (pixel[i] * level) / 0xff;
>> }
>>
>> we generate:
>>
>>         addhnb  z6.b, z0.h, z4.h
>>         addhnb  z5.b, z1.h, z4.h
>>         addhnb  z0.b, z0.h, z6.h
>>         addhnt  z0.b, z1.h, z5.h
>>         add     z0.b, z0.b, z2.b
>>
>> instead of:
>>
>>         addhnb  z6.b, z1.h, z4.h
>>         addhnb  z5.b, z0.h, z4.h
>>         addhnb  z1.b, z1.h, z6.h
>>         addhnb  z0.b, z0.h, z5.h
>>         uzp1    z0.b, z0.b, z1.b
>>         add     z0.b, z0.b, z2.b
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>      * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_<sve_int_op><mode>):
>>      New.
>>      * config/aarch64/iterators.md (binary_top): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      * gcc.dg/vect/vect-div-bitmask-4.c: New test.
>>      * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
>> b/gcc/config/aarch64/aarch64-sve2.md
>> index 
>> ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38
>>  100644
>> --- a/gcc/config/aarch64/aarch64-sve2.md
>> +++ b/gcc/config/aarch64/aarch64-sve2.md
>> @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>"
>>    "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>"
>>  )
>>  
>> +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>"
>> +  [(set (match_operand:<VNARROW> 0 "register_operand" "=w")
>> +    (unspec:<VNARROW>
>> +      [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w")
>
> "0" would be safer, in case the instruction is only split after RA.
>
>> +       (subreg:SVE_FULL_HSDI (unspec:<VNARROW>
>> +         [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w")
>> +          (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")]
>> +         SVE2_INT_BINARY_NARROWB) 0)]
>> +      UNSPEC_PACK))]
>
> I think ideally this would be the canonical pattern, so that we can
> drop the separate top unspecs.  That's more work though, and would
> probably make sense to do once we have a generic way of representing
> the pack.
>
> So OK with the "0" change above.

Hmm, actually, I take that back.  Is this transform really correct?
I think the blend corresponds to a TRN1 rather than a UZP1.
The bottom operations populate the lower half of each wider element
and the top operations populate the upper half.

Thanks,
Richard

Reply via email to