Hi everyone,

I've been working on [0] Haskell bindings for [1] libqd for [2] double-double and quad-double arithmetic, and have been struggling to implement [3] RealFloat, in particular [4] decodeFloat, mostly because of its postcondition but also some issues relating to variable precision:

---8<---
If decodeFloat x yields (m,n), then x is equal in value to m*b^^n, where b is the floating-point radix, and furthermore, either m and n are both zero or else b^(d-1) <= m < b^d, where d is the value of floatDigits x. In particular, decodeFloat 0 = (0,0).
---8<---

(BTW: that should probably really be "... <= abs m < ..."; perhaps this code could be added to the report and/or documentation, if it is correct:

  validDecode :: RealFloat f => f -> Bool
  validDecode f = case decodeFloat f of
    (0,0) -> True
    (m, e) -> b^(d-1) <= abs m && abs m < b^d
    where
      b = floatRadix f
      d = floatDigits f

)

The double-double and quad-double types do not have a fixed precision, though they do have a fixed minimum precision: this means that decodeFloat will (in general) lose some precision. This is because for example in:

  data DoubleDouble = DoubleDouble !CDouble !CDouble
  dd = DoubleDouble a b

the semantics are that "dd = a + b", with |a|>>|b|; coupled with the IEEE implicit leading 1 bit, this means that there may be large gaps between exponents: for example: "1 + 0.5**100 :: DoubleDouble".

So far I've got a "mostly working" implementation thus:

  decodeFloat !(DoubleDouble a b) =
    case (decodeFloat a, decodeFloat b) of
      ((0, 0), (0, 0)) -> (0, 0)
      ((0, 0), (m, e)) -> (m `shiftL` f, e - f)
      ((m, e), (0, 0)) -> (m `shiftL` f, e - f)
      ((m1, e1), (m2, e2)) ->
        let fixup m e =
              if m < mMin
                then fixup (m `shiftL` 1) (e - 1)
                else if m >= mMax
                       then fixup (m `shiftR` 1) (e + 1)
                       else (m, e)
            mMin = 1 `shiftL` (ff - 1)
            mMax = 1 `shiftL` ff
            ff = floatDigits (0 :: DoubleDouble)
            g = e1 - e2 - f
        in  fixup ((m1 `shiftL` f) + (m2 `shiftR` g)) (e1 - f)
    where
      f = floatDigits (0 :: CDouble)

This does meet the postcondition as specified (which leads to breakage in other RealFloat methods), but has a recursion with no termination proof (so far), and is lossy in general:

  > let check f = uncurry encodeFloat (decodeFloat f) == f
  > check (1 + 0.5 ** 120 :: DoubleDouble)
  False

It does however seem to meet a weaker condition:

> let check2 f = (decodeFloat . (`asTypeOf` f) . uncurry encodeFloat . decodeFloat $ f) == decodeFloat f
  > check2 (1 + 0.5 ** 120 :: DoubleDouble)
  True


Questions:

1. Is this weaker condition likely to be good enough in practice?
2. Can my implementation of decodeFloat be simplified?

Thanks for any insights,


Claude

[0] http://hackage.haskell.org/package/qd
[1] http://crd.lbl.gov/~dhbailey/mpdist/
[2] http://crd.lbl.gov/~dhbailey/dhbpapers/arith15.pdf
[3] http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#t:RealFloat [4] http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:decodeFloat

--
http://claudiusmaximus.goto10.org

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to