On 3/19/24 10:23 AM, Palmer Dabbelt wrote:
On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:
On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
As RV has round instructions it is reasonable to use them instead of
calling the library functions.
With my patch for the following C code:
double foo(double a) {
return ceil(a);
}
GCC generates the following ASM code (before it was tail call)
foo:
fabs.d fa4,fa0
lui a5,%hi(.LC0)
fld fa3,%lo(.LC0)(a5)
flt.d a5,fa4,fa3
beq a5,zero,.L3
fcvt.l.d a5,fa0,rup
I'm not sure exactly what context this is in, but my reading of
"according to the current rounding mode" means we'd usually use the
dynamic rounding mode.
As Andrew W. noted, we're dealing with ceil and thus rup is the correct
rounding mode to use here.
My only worry here is that when we were doing the other patterns we
decided not to do rint. I can't remember exactly why, but from reading
the docs we probably just skipped it due to the inexact handling and Zfa
having an instruction that just does this. FP stuff is always a bit of
a time sink, so at that point it probably just fell off the priority list.
rint is supposed to raise FE_INEXACT, so it's actually a good match for
RISC-V fcvt semantics as they appropriately raise FE_INEXACT.
nearby* do not arise FE_INEXACT and thus would rely on the new Zfa
instructions where we have ones that do not raise FE_INEXACT or they
need to be conditional on flag_fp_int_builtin_inexact. One could
reasonably argue that when flag_fp_int_builtin_inexact is enabled that a
call to nearby* ought to be converted into a call to rint*.
I'm not really an FP guy, so I usually just poke around what the other
ports generate and try to figure out what's going on. arm64 has the Zfa
instruction and x86 FP is complicated, so I'm not sure exactly who else
to look at for this sort of stuff. From just looking at the code,
though, I think there's two issues -- I'm not really an FP person,
though, so take this with a grain of salt:
Right. And the condition under which we use the new sequence for
ceil/round actually borrows from x86. Essentially we only use the new
sequence when we've been told we don't care about FE_INEXACT or fp
exceptions in general.
IIUC what we've got here doesn't actually set the inexact flag for the
bounds check failure case, as we're just loading up an exact constant
and doing the bounds check. We're also not clamping to INT_MAX-type
values, but not sure if we're supposed to. I think we could fix both of
those by adjusting the expansion to something like
The state of FE_INEXACT is a don't care here due the condition on the
expansion code.
fabs.d fa4,fa0
lui a5,%hi(.LC0)
fld fa3,%lo(.LC0)(a5)
flt.d a5,fa4,fa3
bne a5,zero,.L3
mv fa0, fa3
.L3:
fcvt.l.d a5,fa0,rup
fcvt.d.l fa4,a5
fsgnj.d fa0,fa4,fa0
ret
and then adjusting the constant to be an epsilon larger than INT_MAX so
it'll still trigger the clamping but also inexact.
I think Jivan's sequence is more correct. It's not just INT_MAX here
that's concerning, there's a whole class of values that cause problems.
There's also a pair of changes to the ISA in 2020 that added the
conversion inexact handling requirement, it was a grey area before. I
don't remember exactly what happened there, but I did remember it
happening. I don't think anyone cares all that much about the
performance of systems that target the older ISAs, so maybe we just
restrict the non-libcall expansion to ISAs that contain the new wording?
I think all this got sufficiently cleaned up. The spec is explicit
about when FE_INEXACT gets raised on the fcvt instructions. I referred
to it repeatedly when analyzing Jivan's work.
We can hash through the final items in a few weeks once the trunk
re-opens for development.
Jeff