> On 18 Oct 2024, at 10:46, Tamar Christina <tamar.christ...@arm.com> wrote: > > External email: Use caution opening links or attachments > > >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Thursday, October 17, 2024 6:05 PM >> To: Jennifer Schmitz <jschm...@nvidia.com> >> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <ktkac...@nvidia.com>; Tamar >> Christina <tamar.christ...@arm.com> >> Subject: Re: [PATCH] SVE intrinsics: Add fold_active_lanes_to method to >> refactor >> svmul and svdiv. >> >> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>> On 16 Oct 2024, at 21:16, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>>> As suggested in >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html, >>>>> this patch adds the method gimple_folder::fold_active_lanes_to (tree X). >>>>> This method folds active lanes to X and sets inactive lanes according to >>>>> the predication, returning a new gimple statement. That makes folding of >>>>> SVE intrinsics easier and reduces code duplication in the >>>>> svxxx_impl::fold implementations. >>>>> Using this new method, svdiv_impl::fold and svmul_impl::fold were >>>>> refactored. >>>>> Additionally, the method was used for two optimizations: >>>>> 1) Fold svdiv to the dividend, if the divisor is all ones and >>>>> 2) for svmul, if one of the operands is all ones, fold to the other >>>>> operand. >>>>> Both optimizations were previously applied to _x and _m predication on >>>>> the RTL level, but not for _z, where svdiv/svmul were still being used. >>>>> For both optimization, codegen was improved by this patch, for example by >>>>> skipping sel instructions with all-same operands and replacing sel >>>>> instructions by mov instructions. >>>>> >>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >> regression. >>>>> OK for mainline? >>>>> >>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>>> >>>>> gcc/ >>>>> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): >>>>> Refactor using fold_active_lanes_to and fold to dividend, is the >>>>> divisor is all ones. >>>>> (svmul_impl::fold): Refactor using fold_active_lanes_to and fold >>>>> to the other operand, if one of the operands is all ones. >>>>> * config/aarch64/aarch64-sve-builtins.h: Declare >>>>> gimple_folder::fold_active_lanes_to (tree). >>>>> * config/aarch64/aarch64-sve-builtins.cc >>>>> (gimple_folder::fold_actives_lanes_to): Add new method to fold >>>>> actives lanes to given argument and setting inactives lanes >>>>> according to the predication. >>>>> >>>>> gcc/testsuite/ >>>>> * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome. >>>>> * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. >>>>> * gcc.target/aarch64/sve/fold_div_zero.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. >>>>> * gcc.target/aarch64/sve/mul_const_run.c: Likewise. >>>> >>>> Thanks, this looks great. Just one comment on the tests: >>>> >>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c >> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c >>>>> index d5a23bf0726..521f8bb4758 100644 >>>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c >>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c >>>>> @@ -57,7 +57,6 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, >> int32_t, >>>>> >>>>> /* >>>>> ** div_1_s32_m_tied1: >>>>> -** sel z0\.s, p0, z0\.s, z0\.s >>>>> ** ret >>>>> */ >>>>> TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t, >>>>> @@ -66,7 +65,7 @@ TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t, >>>>> >>>>> /* >>>>> ** div_1_s32_m_untied: >>>>> -** sel z0\.s, p0, z1\.s, z1\.s >>>>> +** mov z0\.d, z1\.d >>>>> ** ret >>>>> */ >>>>> TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t, >>>>> @@ -217,9 +216,8 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, >> svint32_t, int32_t, >>>>> >>>>> /* >>>>> ** div_1_s32_z_tied1: >>>>> -** mov (z[0-9]+\.s), #1 >>>>> -** movprfx z0\.s, p0/z, z0\.s >>>>> -** sdiv z0\.s, p0/m, z0\.s, \1 >>>>> +** mov (z[0-9]+)\.b, #0 >>>>> +** sel z0\.s, p0, z0\.s, \1\.s >>>>> ** ret >>>>> */ >>>>> TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t, >>>> >>>> Tamar will soon push a patch to change how we generate zeros. >>>> Part of that will involve rewriting existing patterns to be more >>>> forgiving about the exact instruction that is used to zero a register. >>>> >>>> The new preferred way of matching zeros is: >>>> >>>> ** movi? [vdz]([0-9]+)\.(?:[0-9]*[bhsd])?, #?0 >>>> >>>> (yeah, it's a bit of mouthful). Could you change all the tests >>>> to use that? The regexp only captures the register number, so uses >>>> of \1 etc. will need to become z\1. >>>> >>>> OK with that change. But would you mind waiting until Tamar pushes >>>> his patch ("AArch64: use movi d0, #0 to clear SVE registers instead >>>> of mov z0.d, #0"), just to make sure that the tests work with that? >>>> > > Thanks, I've committed the changes now. Thanks, I adjusted the tests, rebased and revalidated. Committed to trunk: e69c2e212011f2bfa6f8c3748d902690b7a3639a Have a good weekend, Jennifer > > Cheers, > Tamar > >>> Thanks for the review. Sure, I can make the changes, wait for Tamar’s >>> patch, and >> re-validate after rebasing. >>> One question about the regexp pattern: >>> The “\.” is outside the second captured group and therefore non-optional, >>> i.e. it >> would match something like >>> “movi d0.,#0”, >>> but not >>> “movi d0, #0”. >> >> Argh! Yes. I clearly didn't test what I thought I'd tested. :( >> >> The “\.” should be inside the group, like you say. >> >> Richard
smime.p7s
Description: S/MIME cryptographic signature