On Sun, Apr 26, 2020 at 10:43:53PM +0200, Iain Buclaw wrote: > >>>> + // The layout of the type is: > >>>> + // > >>>> + // [1| 7 | 56 ][ 8 | 56 ] > >>>> + // [S| Exp | Fraction (hi) ][ Unused | Fraction (low) ] > >>>> + // > >>>> + // We can get the least significant bits by subtracting the > >>>> IEEE > >>>> + // double precision portion from the real value. > >>> > >>> That's not correct. There is no "Unused" field, and the lower fraction > >>> is not always an immediate extension of the higher fraction. > >>> > >>> (It's not 1,7,56 -- it is 1,11,52).
> > All bits are significant to the value, there are no unused bits, for > > most values. The sign and exponent of the second number are very much > > relevant, in general. > > > > I didn't look at your actual implementation, just this comment, but it > > sounds like your tests miss a lot of cases, if no problems were found? > > All these tests pass. That is, the computed compile-time byte layout > (the result of this function) is the same as the layout at run-time. Then either it tests not nearly enough, or it does not implement what the comment says. > // check subnormal storage edge case for Quadruple > testNumberConvert!("real.min_normal/2UL^^56"); > testNumberConvert!("real.min_normal/19"); > testNumberConvert!("real.min_normal/17"); IBM long double has quite different edge cases than IEEE QP float. > /**True random values*/ > testNumberConvert!("-0x9.0f7ee55df77618fp-13829L"); > testNumberConvert!("0x7.36e6e2640120d28p+8797L"); > testNumberConvert!("-0x1.05df6ce4702ccf8p+15835L"); > testNumberConvert!("0x9.54bb0d88806f714p-7088L"); None of these are valid double-double numbers (they all underflow or overflow). > /**Big overflow or underflow*/ > testNumberConvert!("cast(double)-0x9.0f7ee55df77618fp-13829L"); > testNumberConvert!("cast(double)0x7.36e6e2640120d28p+8797L"); > testNumberConvert!("cast(double)-0x1.05df6ce4702ccf8p+15835L"); > testNumberConvert!("cast(double)0x9.54bb0d88806f714p-7088L"); > testNumberConvert!("cast(float)-0x9.0f7ee55df77618fp-13829L"); > testNumberConvert!("cast(float)0x7.36e6e2640120d28p+8797L"); (Exactly like these). Segher