On 3/19/24 10:23 AM, Palmer Dabbelt wrote:
On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:

On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of
calling the library functions.

With my patch for the following C code:
double foo(double a) {
     return ceil(a);
}

GCC generates the following ASM code (before it was tail call)
foo:
         fabs.d  fa4,fa0
         lui     a5,%hi(.LC0)
         fld     fa3,%lo(.LC0)(a5)
         flt.d   a5,fa4,fa3
         beq     a5,zero,.L3
         fcvt.l.d a5,fa0,rup
I'm not sure exactly what context this is in, but my reading of 
"according to the current rounding mode" means we'd usually use the 
dynamic rounding mode.
As Andrew W. noted, we're dealing with ceil and thus rup is the correct rounding mode to use here.


My only worry here is that when we were doing the other patterns we 
decided not to do rint.  I can't remember exactly why, but from reading 
the docs we probably just skipped it due to the inexact handling and Zfa 
having an instruction that just does this.  FP stuff is always a bit of 
a time sink, so at that point it probably just fell off the priority list.
rint is supposed to raise FE_INEXACT, so it's actually a good match for RISC-V fcvt semantics as they appropriately raise FE_INEXACT.
nearby* do not arise FE_INEXACT and thus would rely on the new Zfa 
instructions where we have ones that do not raise FE_INEXACT or they 
need to be conditional on flag_fp_int_builtin_inexact.  One could 
reasonably argue that when flag_fp_int_builtin_inexact is enabled that a 
call to nearby* ought to be converted into a call to rint*.

I'm not really an FP guy, so I usually just poke around what the other 
ports generate and try to figure out what's going on.  arm64 has the Zfa 
instruction and x86 FP is complicated, so I'm not sure exactly who else 
to look at for this sort of stuff.  From just looking at the code, 
though, I think there's two issues -- I'm not really an FP person, 
though, so take this with a grain of salt:
Right. And the condition under which we use the new sequence for ceil/round actually borrows from x86. Essentially we only use the new sequence when we've been told we don't care about FE_INEXACT or fp exceptions in general.
IIUC what we've got here doesn't actually set the inexact flag for the 
bounds check failure case, as we're just loading up an exact constant 
and doing the bounds check.  We're also not clamping to INT_MAX-type 
values, but not sure if we're supposed to.  I think we could fix both of 
those by adjusting the expansion to something like
The state of FE_INEXACT is a don't care here due the condition on the expansion code.

              fabs.d  fa4,fa0
              lui     a5,%hi(.LC0)
              fld     fa3,%lo(.LC0)(a5)
              flt.d   a5,fa4,fa3
              bne     a5,zero,.L3
              mv      fa0, fa3
     .L3:
              fcvt.l.d a5,fa0,rup
              fcvt.d.l        fa4,a5
              fsgnj.d fa0,fa4,fa0
              ret

and then adjusting the constant to be an epsilon larger than INT_MAX so it'll still trigger the clamping but also inexact.
I think Jivan's sequence is more correct. It's not just INT_MAX here that's concerning, there's a whole class of values that cause problems.
There's also a pair of changes to the ISA in 2020 that added the 
conversion inexact handling requirement, it was a grey area before.  I 
don't remember exactly what happened there, but I did remember it 
happening.  I don't think anyone cares all that much about the 
performance of systems that target the older ISAs, so maybe we just 
restrict the non-libcall expansion to ISAs that contain the new wording?
I think all this got sufficiently cleaned up. The spec is explicit about when FE_INEXACT gets raised on the fcvt instructions. I referred to it repeatedly when analyzing Jivan's work.
We can hash through the final items in a few weeks once the trunk 
re-opens for development.
Jeff

Reply via email to