Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Kyrylo Tkachov Tue, 01 Oct 2024 07:28:18 -0700

> On 1 Oct 2024, at 09:48, Tamar Christina <tamar.christ...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Soumya,
> 
> Nice patch!
> 
>> -----Original Message-----
>> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> Sent: Tuesday, October 1, 2024 7:55 AM
>> To: Soumya AR <soum...@nvidia.com>
>> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>
>> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
>> instruction
>> 
>> Hi Soumya
>> 
>>> On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> This patch uses the FSCALE instruction provided by SVE to implement the
>>> standard ldexp family of functions.
>>> 
>>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
>>> following code:
>>> 
>>> float
>>> test_ldexpf (float x, int i)
>>> {
>>>       return __builtin_ldexpf (x, i);
>>> }
>>> 
>>> double
>>> test_ldexp (double x, int i)
>>> {
>>>       return __builtin_ldexp(x, i);
>>> }
>>> 
>>> GCC Output:
>>> 
>>> test_ldexpf:
>>>       b ldexpf
>>> 
>>> test_ldexp:
>>>       b ldexp
>>> 
>>> Since SVE has support for an FSCALE instruction, we can use this to process
>>> scalar floats by moving them to a vector register and performing an fscale 
>>> call,
>>> similar to how LLVM tackles an ldexp builtin as well.
>>> 
>>> New Output:
>>> 
>>> test_ldexpf:
>>>       fmov s31, w0
>>>       ptrue p7.b, all
>>>       fscale z0.s, p7/m, z0.s, z31.s
>>>       ret
>>> 
>>> test_ldexp:
>>>       sxtw x0, w0
>>>       ptrue p7.b, all
>>>       fmov d31, x0
>>>       fscale z0.d, p7/m, z0.d, z31.d
>>>       ret
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-sve.md
>>> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
>>> floating modes and expand to the existing pattern for FSCALE.
>>> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
>>> operands as well as scalar floating modes.
>>> 
>>> * config/aarch64/iterators.md:
>>> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
>>> as SF and DF.
>>> VPRED: Extended the attribute to handle GPF modes.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.target/aarch64/sve/fscale.c: New test.
>> 
>> This patch fixes the bugzilla report at
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
>> So it should be referenced in the ChangeLog entries like so:
>> 
>>      PR target/111733
>>      * config/aarch64/aarch64-sve.md <rest of ChangeLog>
>> 
>> That way the commit hooks will pick it up and updated the bug tracker 
>> accordingly
>> 
>>> 
>>> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>
>> 
>> +(define_expand "ldexp<mode>3"
>> +  [(set (match_operand:GPF 0 "register_operand" "=w")
>> +     (unspec:GPF
>> +       [(match_operand:GPF 1 "register_operand" "w")
>> +        (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
>> +       UNSPEC_COND_FSCALE))]
>> +  "TARGET_SVE"
>> +  {
>> +    rtx ptrue = aarch64_ptrue_reg (<VPRED>mode);
>> +    rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
>> +    emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue,
>> operands[1], operands[2], strictness));
>> +    DONE;
>> +  }
>> 
>> Lines should not exceed 80 columns, this should be wrapped around
> 
> And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this:
> 
> (define_expand "ldexp<mode>3"
>  [(set (match_operand:GPF 0 "register_operand")
>        (unspec:GPF
>          [(match_dup 3)
>           (const_int SVE_RELAXED_GP)
>           (match_operand:GPF 1 "register_operand")
>           (match_operand:<V_INT_EQUIV> 2 "register_operand")]
>          UNSPEC_COND_FSCALE))]
>  "TARGET_SVE"
>  {
>    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
>  }
> )
> 
> It removes the dependency on the exact name of the pattern.
> Also note the dropping of the constraints, expand patterns don't use
> the constraints, only the predicates are checked.
> 

Yeah, I agree. It would be more compact that way and the constraints are indeed 
superfluous on the expanders.
As an aside, this could be extended to vector modes as well, but that can be 
done as a separate patch with new tests.
Thanks,
Kyrill


> Cheers,
> Tamar
> 
>> 
>> The patch looks good to me otherwise.
>> Thanks,
>> Kyrill
Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Reply via email to