Re: RISC-V: Use convert instructions instead of calling library functions

Palmer Dabbelt Tue, 19 Mar 2024 09:23:19 -0700

On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/18/24 3:09 AM, Jivan Hakobyan wrote:

As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup

I'm not sure exactly what context this is in, but my reading of"according to the current rounding mode" means we'd usually use thedynamic rounding mode.

         fcvt.d.l        fa4,a5
         fsgnj.d fa0,fa4,fa0
.L3:
         ret

.LC0:
         .word   0
         .word   1127219200     // 0x4330000000000000


The patch I have evaluated on SPEC2017.
Counted dynamic instructions counts and got the following improvements

510.parest_r       262 m      -
511.povray_r      2.1  b        0.04%
521.wrt_r            269 m       -
526.blender_r    3 b             0.1%
527.cam4_r       15 b           0.6%
538.imagick_r    365 b         7.6%

Overall executed 385 billion fewer instructions which is 0.5%.

A few more notes.

The sequence Jivan is using is derived from LLVM.  The condition in the
generated code tests for those values were are supposed to pass through
unaltered.  The condition in the pattern ensures we do something
sensible WRT FE_INEXACT and mirrors how other ports handle these insns.

My only worry here is that when we were doing the other patterns wedecided not to do rint. I can't remember exactly why, but from readingthe docs we probably just skipped it due to the inexact handling and Zfahaving an instruction that just does this. FP stuff is always a bit ofa time sink, so at that point it probably just fell off the prioritylist.

I'm not really an FP guy, so I usually just poke around what the otherports generate and try to figure out what's going on. arm64 has the Zfainstruction and x86 FP is complicated, so I'm not sure exactly who elseto look at for this sort of stuff. From just looking at the code,though, I think there's two issues -- I'm not really an FP person,though, so take this with a grain of salt:

IIUC what we've got here doesn't actually set the inexact flag for thebounds check failure case, as we're just loading up an exact constantand doing the bounds check. We're also not clamping to INT_MAX-typevalues, but not sure if we're supposed to. I think we could fix both ofthose by adjusting the expansion to something like


             fabs.d  fa4,fa0
             lui     a5,%hi(.LC0)
             fld     fa3,%lo(.LC0)(a5)
             flt.d   a5,fa4,fa3
             bne     a5,zero,.L3
             mv      fa0, fa3
    .L3:
             fcvt.l.d a5,fa0,rup
             fcvt.d.l        fa4,a5
             fsgnj.d fa0,fa4,fa0
             ret

and then adjusting the constant to be an epsilon larger than INT_MAX soit'll still trigger the clamping but also inexact.

There's also a pair of changes to the ISA in 2020 that added theconversion inexact handling requirement, it was a grey area before. Idon't remember exactly what happened there, but I did remember ithappening. I don't think anyone cares all that much about theperformance of systems that target the older ISAs, so maybe we justrestrict the non-libcall expansion to ISAs that contain the new wording?


   commit 5890a1a702abf4157d5879717a39d8ecdae0de68 (tag: draft-20201229-5890a1a)
   Author: Andrew Waterman <and...@sifive.com>
   Date:   Mon Dec 28 18:19:44 2020 -0800

Clarify when FP conversions raise the Inexact flagdiff --git a/src/f.tex b/src/f.tex

   index 8649d75..97c253b 100644
   --- a/src/f.tex
   +++ b/src/f.tex
   @@ -594,9 +594,9 @@ instructions round according to the {\em rm} field.  A 
floating-point register
    can be initialized to floating-point positive zero using FCVT.S.W {\em rd},
    {\tt x0}, which will never set any exception flags.

-All floating-point conversion instructions set the Inexact exception flag if the

   -result differs from its operand value, yet is representable in the 
destination
   -format.
   +All floating-point conversion instructions set the Inexact exception flag if
   +the rounded result differs from the operand value and the Invalid exception
   +flag is not set.

\vspace{-0.2in}

    \begin{center}

commit 27b40fbc798357fcb4b1deaba4553646fe677576 (tag: draft-20200229-27b40fb)

   Author: Andrew Waterman <and...@sifive.com>
   Date:   Fri Feb 28 18:12:47 2020 -0800

Clarify that FCVT instructions signal inexactdiff --git a/src/f.tex b/src/f.tex

   index 7680347..a9022c4 100644
   --- a/src/f.tex
   +++ b/src/f.tex
   @@ -582,6 +582,10 @@ instructions round according to the {\em rm} field.  A 
floating-point register
    can be initialized to floating-point positive zero using FCVT.S.W {\em rd},
    {\tt x0}, which will never set any exception flags.

+All floating-point conversion instructions raise the Inexact exception if the

   +result differs from its operand value, yet is representable in the 
destination
   +format.
   +
    \vspace{-0.2in}
    \begin{center}
    \begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}

Our internal testing shows a benefit well beyond the 7% reduction in
icounts.  Presumably due to fewer calls, fewer transfers across the
register files, better scheduling around the call site, etc.

Obviously for Zfa we'll use the more efficient instructions for that
extension.  But there's no reason to not go forward with this change for
gcc-15.

Ya, I think those are all pretty minor issues (If the exist at all).So it seems like the right way to go in general.


Jeff

Re: RISC-V: Use convert instructions instead of calling library functions

Reply via email to