[Haskell-cafe] implementing RealFloat decodeFloat

Claude Heiland-Allen Wed, 22 Dec 2010 06:11:15 -0800

Hi everyone,

I've been working on [0] Haskell bindings for [1] libqd for [2]double-double and quad-double arithmetic, and have been struggling toimplement [3] RealFloat, in particular [4] decodeFloat, mostly becauseof its postcondition but also some issues relating to variable precision:


---8<---

If decodeFloat x yields (m,n), then x is equal in value to m*b^^n, whereb is the floating-point radix, and furthermore, either m and n are bothzero or else b^(d-1) <= m < b^d, where d is the value of floatDigits x.In particular, decodeFloat 0 = (0,0).

---8<---

(BTW: that should probably really be "... <= abs m < ..."; perhaps thiscode could be added to the report and/or documentation, if it is correct:


  validDecode :: RealFloat f => f -> Bool
  validDecode f = case decodeFloat f of
    (0,0) -> True
    (m, e) -> b^(d-1) <= abs m && abs m < b^d
    where
      b = floatRadix f
      d = floatDigits f

)

The double-double and quad-double types do not have a fixed precision,though they do have a fixed minimum precision: this means thatdecodeFloat will (in general) lose some precision. This is because forexample in:


  data DoubleDouble = DoubleDouble !CDouble !CDouble
  dd = DoubleDouble a b

the semantics are that "dd = a + b", with |a|>>|b|; coupled with theIEEE implicit leading 1 bit, this means that there may be large gapsbetween exponents: for example: "1 + 0.5**100 :: DoubleDouble".


So far I've got a "mostly working" implementation thus:

  decodeFloat !(DoubleDouble a b) =
    case (decodeFloat a, decodeFloat b) of
      ((0, 0), (0, 0)) -> (0, 0)
      ((0, 0), (m, e)) -> (m `shiftL` f, e - f)
      ((m, e), (0, 0)) -> (m `shiftL` f, e - f)
      ((m1, e1), (m2, e2)) ->
        let fixup m e =
              if m < mMin
                then fixup (m `shiftL` 1) (e - 1)
                else if m >= mMax
                       then fixup (m `shiftR` 1) (e + 1)
                       else (m, e)
            mMin = 1 `shiftL` (ff - 1)
            mMax = 1 `shiftL` ff
            ff = floatDigits (0 :: DoubleDouble)
            g = e1 - e2 - f
        in  fixup ((m1 `shiftL` f) + (m2 `shiftR` g)) (e1 - f)
    where
      f = floatDigits (0 :: CDouble)

This does meet the postcondition as specified (which leads to breakagein other RealFloat methods), but has a recursion with no terminationproof (so far), and is lossy in general:


  > let check f = uncurry encodeFloat (decodeFloat f) == f
  > check (1 + 0.5 ** 120 :: DoubleDouble)
  False

It does however seem to meet a weaker condition:

> let check2 f = (decodeFloat . (`asTypeOf` f) . uncurry encodeFloat. decodeFloat $ f) == decodeFloat f

  > check2 (1 + 0.5 ** 120 :: DoubleDouble)
  True


Questions:

1. Is this weaker condition likely to be good enough in practice?
2. Can my implementation of decodeFloat be simplified?

Thanks for any insights,


Claude

[0] http://hackage.haskell.org/package/qd
[1] http://crd.lbl.gov/~dhbailey/mpdist/
[2] http://crd.lbl.gov/~dhbailey/dhbpapers/arith15.pdf

[3]http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#t:RealFloat[4]http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:decodeFloat


--
http://claudiusmaximus.goto10.org

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] implementing RealFloat decodeFloat

Reply via email to