Hi Soumya,

Nice patch!

> -----Original Message-----
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Sent: Tuesday, October 1, 2024 7:55 AM
> To: Soumya AR <soum...@nvidia.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>
> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
> instruction
> 
> Hi Soumya
> 
> > On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > This patch uses the FSCALE instruction provided by SVE to implement the
> > standard ldexp family of functions.
> >
> > Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
> > following code:
> >
> > float
> > test_ldexpf (float x, int i)
> > {
> >        return __builtin_ldexpf (x, i);
> > }
> >
> > double
> > test_ldexp (double x, int i)
> > {
> >        return __builtin_ldexp(x, i);
> > }
> >
> > GCC Output:
> >
> > test_ldexpf:
> >        b ldexpf
> >
> > test_ldexp:
> >        b ldexp
> >
> > Since SVE has support for an FSCALE instruction, we can use this to process
> > scalar floats by moving them to a vector register and performing an fscale 
> > call,
> > similar to how LLVM tackles an ldexp builtin as well.
> >
> > New Output:
> >
> > test_ldexpf:
> >        fmov s31, w0
> >        ptrue p7.b, all
> >        fscale z0.s, p7/m, z0.s, z31.s
> >        ret
> >
> > test_ldexp:
> >        sxtw x0, w0
> >        ptrue p7.b, all
> >        fmov d31, x0
> >        fscale z0.d, p7/m, z0.d, z31.d
> >        ret
> >
> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > regression.
> > OK for mainline?
> >
> > Signed-off-by: Soumya AR <soum...@nvidia.com>
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve.md
> > (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
> > floating modes and expand to the existing pattern for FSCALE.
> > (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
> > operands as well as scalar floating modes.
> >
> > * config/aarch64/iterators.md:
> > SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
> > as SF and DF.
> > VPRED: Extended the attribute to handle GPF modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/fscale.c: New test.
> 
> This patch fixes the bugzilla report at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
> So it should be referenced in the ChangeLog entries like so:
> 
>       PR target/111733
>       * config/aarch64/aarch64-sve.md <rest of ChangeLog>
> 
> That way the commit hooks will pick it up and updated the bug tracker 
> accordingly
> 
> >
> > <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>
> 
> +(define_expand "ldexp<mode>3"
> +  [(set (match_operand:GPF 0 "register_operand" "=w")
> +     (unspec:GPF
> +       [(match_operand:GPF 1 "register_operand" "w")
> +        (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
> +       UNSPEC_COND_FSCALE))]
> +  "TARGET_SVE"
> +  {
> +    rtx ptrue = aarch64_ptrue_reg (<VPRED>mode);
> +    rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
> +    emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue,
> operands[1], operands[2], strictness));
> +    DONE;
> +  }
> 
> Lines should not exceed 80 columns, this should be wrapped around

And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this:

(define_expand "ldexp<mode>3"
  [(set (match_operand:GPF 0 "register_operand")
        (unspec:GPF
          [(match_dup 3)
           (const_int SVE_RELAXED_GP)
           (match_operand:GPF 1 "register_operand")
           (match_operand:<V_INT_EQUIV> 2 "register_operand")]
          UNSPEC_COND_FSCALE))]
  "TARGET_SVE"
  {
    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
  }
)

It removes the dependency on the exact name of the pattern.
Also note the dropping of the constraints, expand patterns don't use
the constraints, only the predicates are checked.

Cheers,
Tamar

> 
> The patch looks good to me otherwise.
> Thanks,
> Kyrill

Reply via email to