64 multiplication

Alex Bennée Thu, 29 Mar 2018 03:01:09 -0700

Emilio G. Cota <c...@braap.org> writes:

> On Wed, Mar 28, 2018 at 14:26:30 +0100, Alex Bennée wrote:
>> Emilio G. Cota <c...@braap.org> writes:
>> OK I've had a bit more of a play and I think we can drop the macro abuse
>> and have common wrappers for the host_fpu. We don't want to intermingle
>> with the soft float slow path to stop the compiler adding overhead. We
>> also need a wrapper for each float size and op count due to differences
>> in the classify functions. However the boiler plate is pretty common and
>> where there are differences the compiler is smart enough to fix it.
>>
>> See branch:
>> https://github.com/stsquad/qemu/tree/hostfloat/common-fpu-wrapper
>>
>> I keep the numbers for add/sub and doubled the speed of float32_mul on
>> my box, without any macros ;-)
>
> I really like the idea of letting the compiler unfold everything.
> In fact I just did that to re-implement fp-bench (now with support
> for -t host/soft, yay).


It's a poor mans templates but it certainly makes reading and debugging
the code a lot easier. Of course sometimes it does feel like you are
doing more work to guide the compiler to the right answer....

>
>> Full patch inline:
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index d0f1f65c12..89217b5e67 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -879,56 +879,72 @@ soft_float64_sub(float64 a, float64 b, float_status 
>> *status)
>>      return float64_round_pack_canonical(pr, status);
>>  }
> (snip)
>> +static float fpu_mul32(float a, float b, bool *nocheck) {
>> +
>> +    if (float32_is_zero(a) || float32_is_zero(b)) {
>> +        bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
>> +        *nocheck = true;
>> +        return float32_set_sign((0), signbit);
>> +    } else {
>> +        float ha = float32_to_float(a);
>> +        float hb = float32_to_float(b);
>> +        float hr = ha * hb;
>> +        return hr;
>>      }
>> +}
>
> This function is wrong :-(
>
> Note that a and b are floats, not float32's. So if any of
> them is 0.X then they get silently converted to 0, which goes via the
> fast(er) path above. This explains the speedup.

So we need to push the casting into the helpers. Well at least it didn't
get any slower once fixed ;-)

>
> Note that you could have caught this with:
>
>   $ ./fp-test -t soft ibm/* -w whitelist.txt -e x
>
> Compiling with -Wconversion would also point these out, but the output
> is way too noisy to be useful.
>
>
> That said, I'll take inspiration from your approach for v3--hopefully
> without (many) macros this time round.
>
> Thanks!
>
>               Emilio


--
Alex Bennée

Re: [Qemu-devel] [PATCH v2 09/14] hardfloat: support float32/64 multiplication

Reply via email to