On Fri, 30 May 2014, Michael Meissner wrote: > One issue is the current mode setup is when you create new floating point > types, the widening system kicks in and the compiler will generate all sorts > of > widening from one 128-bit floating point format to another (because internally > the precision for IBM extended double is less than the precision of IEEE > 128-bit, due to the size of the mantisas). Ideally we need a different way to > create an alternate floating point mode than FRACITION_FLOAT_MODE that does no > automatic widening. If there is a way under the current system, I am not > aware > of it.
When you support both types (under different names) in one compiler, you do of course need to support conversions between them - but the compiler shouldn't generate such conversions automatically. Furthermore, if the usual arithmetic conversions are applied to find a common type, you have the issue that neither type's values are a subset of the other's (__float128 has wider range, but __ibm128 can represent values with discontiguous mantissa bits spanning more than 113 bits). DTS 18661-3 (N1834) says "If both operands have floating types and neither of the sets of values of their corresponding real types is a subset of (or equivalent to) the other, the behavior is undefined.". I'd suggest making this (mixed arithmetic or conditional expressions between __float128 and __ibm128) an error for both C and C++, so people need to use an explicit cast, or implicit conversion by assignment etc., if they wish to mix the two types in arithmetic. (Conversion from __ibm128 to __float128 is a matter of converting the two halves and adding them - except for signed zero you must just convert the top half to avoid getting a zero of the wrong sign, and for NaNs you must also just convert the top half to avoid a spurious exception if the top half is a quiet NaN (meaning the whole long double is a quiet NaN) but the low half is a signaling NaN. Conversion from __float128 to __ibm128 would presumably be done in the usual way of converting to double, and, if the result is finite, subtracting the double from the __float128 value, converting the remainder, and renormalizing in case the low part you get that way is exactly 0.5ulp of the high part and the high part has its low bit set.) -- Joseph S. Myers jos...@codesourcery.com