Re: long double losing mantissa bits

Taylor R Campbell Tue, 19 Jan 2021 14:16:56 -0800

> Date: Tue, 19 Jan 2021 22:10:56 +0100
> From: Rhialto <rhia...@falu.nl>
> 
> The loss of precision probably occurs at the line
> 
>     // The following big literal is 2 to the 56th power:
>     ufrac = (uint64_t) (frac * 72057594037927936.0);
> 
> where frac is in the range [ 0.5 , 1.0 >, so I would expect that
> multiplying with 2**56 is perfectly feasable (just changes the
> exponent). Debugging output shows that differences in the lsbits that
> were detectable before (when printed with %La), were no longer after.


In the following test case:

   input: 72057594037927937 -> scanf: 72057594037927937.000000
   frac: 0.500000 0x8.00000000000008p-4 sexp: 57
   uexp: b9
   ufrac: 0080000000000000         
   56   : 00ffffffffffffff         
   Unexpected result: 5c80 0000 0000 0000
     expected       : 5c80 0000 0000 0001
   056200: sign:  0 uexp:  b9 ufrac: 00 0000 0000 0000

Here frac = 0x8.00000000000008p-4 = (1 + 2^56)/2^57, so
frac*72057594037927936 = frac * 2^56 = 0x8.00000000000008p52 =
0x80000000000000.8p52.

This is not an integer, so conversion to uint64_t rounds it to
nearest, with ties to even, so you get ufrac = 0x80000000000000 =
36028797018963968 as shown in the result.

It looks like you have an off-by-one error in your exponent handling.
If you want to scale the fractional part into an integer, you need to
multiply by 2^57, not by 2^56.

Re: long double losing mantissa bits

Reply via email to