Hi Soumya, Nice patch!
> -----Original Message----- > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Sent: Tuesday, October 1, 2024 7:55 AM > To: Soumya AR <soum...@nvidia.com> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com> > Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE > instruction > > Hi Soumya > > > On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote: > > > > External email: Use caution opening links or attachments > > > > > > This patch uses the FSCALE instruction provided by SVE to implement the > > standard ldexp family of functions. > > > > Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the > > following code: > > > > float > > test_ldexpf (float x, int i) > > { > > return __builtin_ldexpf (x, i); > > } > > > > double > > test_ldexp (double x, int i) > > { > > return __builtin_ldexp(x, i); > > } > > > > GCC Output: > > > > test_ldexpf: > > b ldexpf > > > > test_ldexp: > > b ldexp > > > > Since SVE has support for an FSCALE instruction, we can use this to process > > scalar floats by moving them to a vector register and performing an fscale > > call, > > similar to how LLVM tackles an ldexp builtin as well. > > > > New Output: > > > > test_ldexpf: > > fmov s31, w0 > > ptrue p7.b, all > > fscale z0.s, p7/m, z0.s, z31.s > > ret > > > > test_ldexp: > > sxtw x0, w0 > > ptrue p7.b, all > > fmov d31, x0 > > fscale z0.d, p7/m, z0.d, z31.d > > ret > > > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no > > regression. > > OK for mainline? > > > > Signed-off-by: Soumya AR <soum...@nvidia.com> > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve.md > > (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar > > floating modes and expand to the existing pattern for FSCALE. > > (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE > > operands as well as scalar floating modes. > > > > * config/aarch64/iterators.md: > > SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well > > as SF and DF. > > VPRED: Extended the attribute to handle GPF modes. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/sve/fscale.c: New test. > > This patch fixes the bugzilla report at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733 > So it should be referenced in the ChangeLog entries like so: > > PR target/111733 > * config/aarch64/aarch64-sve.md <rest of ChangeLog> > > That way the commit hooks will pick it up and updated the bug tracker > accordingly > > > > > <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch> > > +(define_expand "ldexp<mode>3" > + [(set (match_operand:GPF 0 "register_operand" "=w") > + (unspec:GPF > + [(match_operand:GPF 1 "register_operand" "w") > + (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")] > + UNSPEC_COND_FSCALE))] > + "TARGET_SVE" > + { > + rtx ptrue = aarch64_ptrue_reg (<VPRED>mode); > + rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode); > + emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue, > operands[1], operands[2], strictness)); > + DONE; > + } > > Lines should not exceed 80 columns, this should be wrapped around And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this: (define_expand "ldexp<mode>3" [(set (match_operand:GPF 0 "register_operand") (unspec:GPF [(match_dup 3) (const_int SVE_RELAXED_GP) (match_operand:GPF 1 "register_operand") (match_operand:<V_INT_EQUIV> 2 "register_operand")] UNSPEC_COND_FSCALE))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (<VPRED>mode); } ) It removes the dependency on the exact name of the pattern. Also note the dropping of the constraints, expand patterns don't use the constraints, only the predicates are checked. Cheers, Tamar > > The patch looks good to me otherwise. > Thanks, > Kyrill