> On 18 Oct 2024, at 10:46, Tamar Christina <tamar.christ...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Thursday, October 17, 2024 6:05 PM
>> To: Jennifer Schmitz <jschm...@nvidia.com>
>> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <ktkac...@nvidia.com>; Tamar
>> Christina <tamar.christ...@arm.com>
>> Subject: Re: [PATCH] SVE intrinsics: Add fold_active_lanes_to method to 
>> refactor
>> svmul and svdiv.
>> 
>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>> On 16 Oct 2024, at 21:16, Richard Sandiford <richard.sandif...@arm.com>
>> wrote:
>>>> 
>>>> External email: Use caution opening links or attachments
>>>> 
>>>> 
>>>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>>> As suggested in
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html,
>>>>> this patch adds the method gimple_folder::fold_active_lanes_to (tree X).
>>>>> This method folds active lanes to X and sets inactive lanes according to
>>>>> the predication, returning a new gimple statement. That makes folding of
>>>>> SVE intrinsics easier and reduces code duplication in the
>>>>> svxxx_impl::fold implementations.
>>>>> Using this new method, svdiv_impl::fold and svmul_impl::fold were 
>>>>> refactored.
>>>>> Additionally, the method was used for two optimizations:
>>>>> 1) Fold svdiv to the dividend, if the divisor is all ones and
>>>>> 2) for svmul, if one of the operands is all ones, fold to the other 
>>>>> operand.
>>>>> Both optimizations were previously applied to _x and _m predication on
>>>>> the RTL level, but not for _z, where svdiv/svmul were still being used.
>>>>> For both optimization, codegen was improved by this patch, for example by
>>>>> skipping sel instructions with all-same operands and replacing sel
>>>>> instructions by mov instructions.
>>>>> 
>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no
>> regression.
>>>>> OK for mainline?
>>>>> 
>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>>> 
>>>>> gcc/
>>>>>     * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>>>>     Refactor using fold_active_lanes_to and fold to dividend, is the
>>>>>     divisor is all ones.
>>>>>     (svmul_impl::fold): Refactor using fold_active_lanes_to and fold
>>>>>     to the other operand, if one of the operands is all ones.
>>>>>     * config/aarch64/aarch64-sve-builtins.h: Declare
>>>>>     gimple_folder::fold_active_lanes_to (tree).
>>>>>     * config/aarch64/aarch64-sve-builtins.cc
>>>>>     (gimple_folder::fold_actives_lanes_to): Add new method to fold
>>>>>     actives lanes to given argument and setting inactives lanes
>>>>>     according to the predication.
>>>>> 
>>>>> gcc/testsuite/
>>>>>     * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome.
>>>>>     * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>>>     * gcc.target/aarch64/sve/mul_const_run.c: Likewise.
>>>> 
>>>> Thanks, this looks great.  Just one comment on the tests:
>>>> 
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>>>> index d5a23bf0726..521f8bb4758 100644
>>>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>>>> @@ -57,7 +57,6 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t,
>> int32_t,
>>>>> 
>>>>> /*
>>>>> ** div_1_s32_m_tied1:
>>>>> -**   sel     z0\.s, p0, z0\.s, z0\.s
>>>>> **   ret
>>>>> */
>>>>> TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>>>>> @@ -66,7 +65,7 @@ TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>>>>> 
>>>>> /*
>>>>> ** div_1_s32_m_untied:
>>>>> -**   sel     z0\.s, p0, z1\.s, z1\.s
>>>>> +**   mov     z0\.d, z1\.d
>>>>> **   ret
>>>>> */
>>>>> TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
>>>>> @@ -217,9 +216,8 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied,
>> svint32_t, int32_t,
>>>>> 
>>>>> /*
>>>>> ** div_1_s32_z_tied1:
>>>>> -**   mov     (z[0-9]+\.s), #1
>>>>> -**   movprfx z0\.s, p0/z, z0\.s
>>>>> -**   sdiv    z0\.s, p0/m, z0\.s, \1
>>>>> +**   mov     (z[0-9]+)\.b, #0
>>>>> +**   sel     z0\.s, p0, z0\.s, \1\.s
>>>>> **   ret
>>>>> */
>>>>> TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t,
>>>> 
>>>> Tamar will soon push a patch to change how we generate zeros.
>>>> Part of that will involve rewriting existing patterns to be more
>>>> forgiving about the exact instruction that is used to zero a register.
>>>> 
>>>> The new preferred way of matching zeros is:
>>>> 
>>>> **      movi?   [vdz]([0-9]+)\.(?:[0-9]*[bhsd])?, #?0
>>>> 
>>>> (yeah, it's a bit of mouthful).  Could you change all the tests
>>>> to use that?  The regexp only captures the register number, so uses
>>>> of \1 etc. will need to become z\1.
>>>> 
>>>> OK with that change.  But would you mind waiting until Tamar pushes
>>>> his patch ("AArch64: use movi d0, #0 to clear SVE registers instead
>>>> of mov z0.d, #0"), just to make sure that the tests work with that?
>>>> 
> 
> Thanks, I've committed the changes now.
Thanks, I adjusted the tests, rebased and revalidated.
Committed to trunk: e69c2e212011f2bfa6f8c3748d902690b7a3639a
Have a good weekend,
Jennifer
> 
> Cheers,
> Tamar
> 
>>> Thanks for the review. Sure, I can make the changes, wait for Tamar’s 
>>> patch, and
>> re-validate after rebasing.
>>> One question about the regexp pattern:
>>> The “\.” is outside the second captured group and therefore non-optional, 
>>> i.e. it
>> would match something like
>>> “movi       d0.,#0”,
>>> but not
>>> “movi       d0, #0”.
>> 
>> Argh!  Yes.  I clearly didn't test what I thought I'd tested. :(
>> 
>> The “\.” should be inside the group, like you say.
>> 
>> Richard


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to