On Mon, 2 Sep 2019, Tejas Joshi wrote: > Hello. > Should a result like 1.4 be considered as inexact if truncating > (narrowing?) from double to float? (due to loss of trailing bits)
If the mathematical result of the arithmetic operation is literally the decimal number 1.4, as opposed to the double value represented by the C constant 1.4 which is actually 0x1.6666666666666p+0, then it is inexact regardless of the (non-decimal) types involved. For example, fdiv (7, 5), ddivl (7, 5), etc. are always inexact. If the mathematical result of the arithmetic operation is 0x1.6666666666666p+0, the closest approximation to 1.4 in IEEE binary64, then it is inexact for result formats narrower than binary64 and exact for result formats that can represent that value. For example, fadd (1.4, 0.0) is inexact (the truncation to float is inexact although the addition is exact). But daddl (1.4, 0.0) - note the arguments are double constants, not long double - is exact, because the mathematical result is exactly representable in double. Whereas daddl (1.4L, 0.0L) would be inexact if long double is wider than double. The question is always whether the infinite-precision mathematical result of the arithmetic operation - which takes values representable in its argument types - is exactly representable in the final result type. -- Joseph S. Myers jos...@codesourcery.com