Hi, On Thu, May 30 2019, Tejas Joshi wrote: > Hello. > I tried to check the values for significand words using _Float128 > using a test program with value larger than 64 bit. > Test program : > > int main () > { > _Float128 x = 18446744073709551617.5; (i.e. 2^64 + 1.5 which is > certainly longer than 64-bit) > _Float128 y = __builtin_roundf128 (x); > }
Interesting, I was also puzzled for a moment. But notice that: int main () { _Float128 x = 18446744073709551617.5f128; _Float128 y = __builtin_roundf128 (x); } behaves as expected... the difference is of course the suffix pegged to the literal constant (see https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Floating-Types.html). I would also expect GCC to use a larger type if a constant does not fit into a double, but apparently that does not happen. I would have to check but it is probably the right behavior according to the standard. > > The lower words of significand (sig[1] and sig[0] for 64-bit system) > are still being zero. I haven't included the roundevenf128 yet but > inspecting this on real_round function. I figured out what was going on when I realized that in your testcase, sig[0] was equal to 0x8000000000000000 and so some precision has been lost. Then it was easy to guess that it was because it was represented in a narrower type. Hope this helps, Martin