Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds

Richard Henderson Thu, 07 Nov 2013 16:14:52 -0800

On 11/08/2013 09:30 AM, Richard Henderson wrote:
> On 11/08/2013 09:28 AM, Richard Henderson wrote:
>> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>>          }                                                                  
>>>    \
>>> +                                                                           
>>>    \
>>> +        if (r2sp) {                                                        
>>>    \
>>> +            float32 tmp32 = float64_to_float32(xt_out.fld[i],              
>>>    \
>>> +                                               &env->fp_status);           
>>>    \
>>> +            xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);    
>>>    \
>>> +        }                                                                  
>>>    \
>>> +                                                                           
>>>    \
>>
>> You can't get correct results for a single-precision fma from a
>> double-precision fma and merely rounding the results.
>>
>> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.
> 
> Blah, nevermind.  That would be using separate add+mul in double-precision, 
> not
> using a double-precision fma primitive.


Hmph.  I was right the first time.  See

> http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/

for example inputs that suffer from double-rounding.

What's needed in each of the examples are infinite precision values containing
55 bits.  This is easy to accomplish with fma.

Two 23-bit inputs can create a product with 46 significant bits.  One can
append 23 more significant bits by choosing an exponent for the addend that
does not overlap the product.  Thus one can create (almost) every intermediate
result with up to 69 consecutive bits (the exception being products without
factors that can be represented in 23-bits).

I'm too lazy to decompose the examples therein to actual single-precision
inputs, but I'm certain it can be done.

Thus you *do* need the round-to-zero plus inexact solution that glibc uses.
(Or to perform the fma in 128-bits and round once, but I think that would be
way more intrusive wrt the rest of the code, and more expensive than necessary.)


r~

Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds

Reply via email to