Emilio G. Cota <c...@braap.org> writes: > On Wed, Mar 28, 2018 at 14:26:30 +0100, Alex Bennée wrote: >> Emilio G. Cota <c...@braap.org> writes: >> OK I've had a bit more of a play and I think we can drop the macro abuse >> and have common wrappers for the host_fpu. We don't want to intermingle >> with the soft float slow path to stop the compiler adding overhead. We >> also need a wrapper for each float size and op count due to differences >> in the classify functions. However the boiler plate is pretty common and >> where there are differences the compiler is smart enough to fix it. >> >> See branch: >> https://github.com/stsquad/qemu/tree/hostfloat/common-fpu-wrapper >> >> I keep the numbers for add/sub and doubled the speed of float32_mul on >> my box, without any macros ;-) > > I really like the idea of letting the compiler unfold everything. > In fact I just did that to re-implement fp-bench (now with support > for -t host/soft, yay).
It's a poor mans templates but it certainly makes reading and debugging the code a lot easier. Of course sometimes it does feel like you are doing more work to guide the compiler to the right answer.... > >> Full patch inline: >> >> diff --git a/fpu/softfloat.c b/fpu/softfloat.c >> index d0f1f65c12..89217b5e67 100644 >> --- a/fpu/softfloat.c >> +++ b/fpu/softfloat.c >> @@ -879,56 +879,72 @@ soft_float64_sub(float64 a, float64 b, float_status >> *status) >> return float64_round_pack_canonical(pr, status); >> } > (snip) >> +static float fpu_mul32(float a, float b, bool *nocheck) { >> + >> + if (float32_is_zero(a) || float32_is_zero(b)) { >> + bool signbit = float32_is_neg(a) ^ float32_is_neg(b); >> + *nocheck = true; >> + return float32_set_sign((0), signbit); >> + } else { >> + float ha = float32_to_float(a); >> + float hb = float32_to_float(b); >> + float hr = ha * hb; >> + return hr; >> } >> +} > > This function is wrong :-( > > Note that a and b are floats, not float32's. So if any of > them is 0.X then they get silently converted to 0, which goes via the > fast(er) path above. This explains the speedup. So we need to push the casting into the helpers. Well at least it didn't get any slower once fixed ;-) > > Note that you could have caught this with: > > $ ./fp-test -t soft ibm/* -w whitelist.txt -e x > > Compiling with -Wconversion would also point these out, but the output > is way too noisy to be useful. > > > That said, I'll take inspiration from your approach for v3--hopefully > without (many) macros this time round. > > Thanks! > > Emilio -- Alex Bennée