Re: RISC-V: Use convert instructions instead of calling library functions

Palmer Dabbelt Tue, 19 Mar 2024 13:19:37 -0700

On Tue, 19 Mar 2024 12:58:41 PDT (-0700), Andrew Waterman wrote:

On Tue, Mar 19, 2024 at 9:23 AM Palmer Dabbelt <pal...@dabbelt.com> wrote:

On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:
>
>
> On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
>> As RV has round instructions it is reasonable to use them instead of
>> calling the library functions.
>>
>> With my patch for the following C code:
>> double foo(double a) {
>>      return ceil(a);
>> }
>>
>> GCC generates the following ASM code (before it was tail call)
>> foo:
>>          fabs.d  fa4,fa0
>>          lui     a5,%hi(.LC0)
>>          fld     fa3,%lo(.LC0)(a5)
>>          flt.d   a5,fa4,fa3
>>          beq     a5,zero,.L3
>>          fcvt.l.d a5,fa0,rup

I'm not sure exactly what context this is in, but my reading of
"according to the current rounding mode" means we'd usually use the
dynamic rounding mode.


ceil doesn't depend on the current rounding mode; rup is correct.  For
rint, you'd be correct.

Ya, right, that's pretty obvious -- I must have just been falling asleepthis morning.

>>          fcvt.d.l        fa4,a5
>>          fsgnj.d fa0,fa4,fa0
>> .L3:
>>          ret
>>
>> .LC0:
>>          .word   0
>>          .word   1127219200     // 0x4330000000000000
>>
>>
>> The patch I have evaluated on SPEC2017.
>> Counted dynamic instructions counts and got the following improvements
>>
>> 510.parest_r       262 m      -
>> 511.povray_r      2.1  b        0.04%
>> 521.wrt_r            269 m       -
>> 526.blender_r    3 b             0.1%
>> 527.cam4_r       15 b           0.6%
>> 538.imagick_r    365 b         7.6%
>>
>> Overall executed 385 billion fewer instructions which is 0.5%.
> A few more notes.
>
> The sequence Jivan is using is derived from LLVM.  The condition in the
> generated code tests for those values were are supposed to pass through
> unaltered.  The condition in the pattern ensures we do something
> sensible WRT FE_INEXACT and mirrors how other ports handle these insns.

My only worry here is that when we were doing the other patterns we
decided not to do rint.  I can't remember exactly why, but from reading
the docs we probably just skipped it due to the inexact handling and Zfa
having an instruction that just does this.  FP stuff is always a bit of
a time sink, so at that point it probably just fell off the priority
list.

I'm not really an FP guy, so I usually just poke around what the other
ports generate and try to figure out what's going on.  arm64 has the Zfa
instruction and x86 FP is complicated, so I'm not sure exactly who else
to look at for this sort of stuff.  From just looking at the code,
though, I think there's two issues -- I'm not really an FP person,
though, so take this with a grain of salt:

IIUC what we've got here doesn't actually set the inexact flag for the
bounds check failure case



I think the original code was correct.  If you exceed the bounds, then by
construction you know that the input was already an integer, and so not
setting NX is correct.  If you didn't exceed the bounds, then fcvt.l.d will
set NX when appropriate.

, as we're just loading up an exact constant

and doing the bounds check.  We're also not clamping to INT_MAX-type
values, but not sure if we're supposed to.  I think we could fix both of
those by adjusting the expansion to something like

              fabs.d  fa4,fa0
              lui     a5,%hi(.LC0)
              fld     fa3,%lo(.LC0)(a5)
              flt.d   a5,fa4,fa3
              bne     a5,zero,.L3
              mv      fa0, fa3
     .L3:
              fcvt.l.d a5,fa0,rup
              fcvt.d.l        fa4,a5
              fsgnj.d fa0,fa4,fa0
              ret

and then adjusting the constant to be an epsilon larger than INT_MAX so
it'll still trigger the clamping but also inexact.


ceil/rint/etc. are supposed to work for the whole range of their
floating-point type; the range of the integral types isn't supposed to
affect the result.  The original code was correct in this regard, too.

Ya, sorry, I think I just misunderstood what was going on here -- it'snot INT_MAX, it's just the largest FP values for which mantissas mightproduce non-integral values.

There's also a pair of changes to the ISA in 2020 that added the
conversion inexact handling requirement, it was a grey area before.  I
don't remember exactly what happened there, but I did remember it
happening.  I don't think anyone cares all that much about the
performance of systems that target the older ISAs, so maybe we just
restrict the non-libcall expansion to ISAs that contain the new wording?

    commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag:
draft-20201229-5890a1a)
    Author: Andrew Waterman <and...@sifive.com>
    Date:   Mon Dec 28 18:19:44 2020 -0800

        Clarify when FP conversions raise the Inexact flag

    diff --git a/src/f.tex b/src/f.tex
    index 8649d75..97c253b 100644
    --- a/src/f.tex
    +++ b/src/f.tex
    @@ -594,9 +594,9 @@ instructions round according to the {\em rm}
field.  A floating-point register
     can be initialized to floating-point positive zero using FCVT.S.W
{\em rd},
     {\tt x0}, which will never set any exception flags.

    -All floating-point conversion instructions set the Inexact exception
flag if the
    -result differs from its operand value, yet is representable in the
destination
    -format.
    +All floating-point conversion instructions set the Inexact exception
flag if
    +the rounded result differs from the operand value and the Invalid
exception
    +flag is not set.

     \vspace{-0.2in}
     \begin{center}

    commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag:
draft-20200229-27b40fb)
    Author: Andrew Waterman <and...@sifive.com>
    Date:   Fri Feb 28 18:12:47 2020 -0800

        Clarify that FCVT instructions signal inexact

    diff --git a/src/f.tex b/src/f.tex
    index 7680347..a9022c4 100644
    --- a/src/f.tex
    +++ b/src/f.tex
    @@ -582,6 +582,10 @@ instructions round according to the {\em rm}
field.  A floating-point register
     can be initialized to floating-point positive zero using FCVT.S.W
{\em rd},
     {\tt x0}, which will never set any exception flags.

    +All floating-point conversion instructions raise the Inexact
exception if the
    +result differs from its operand value, yet is representable in the
destination
    +format.
    +
     \vspace{-0.2in}
     \begin{center}
     \begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}

> Our internal testing shows a benefit well beyond the 7% reduction in
> icounts.  Presumably due to fewer calls, fewer transfers across the
> register files, better scheduling around the call site, etc.
>
> Obviously for Zfa we'll use the more efficient instructions for that
> extension.  But there's no reason to not go forward with this change for
> gcc-15.

Ya, I think those are all pretty minor issues (If the exist at all).
So it seems like the right way to go in general.

>
> Jeff

Re: RISC-V: Use convert instructions instead of calling library functions

Reply via email to