Luis Zarrabeitia <ky...@uh.cu> wrote: > On Thursday 21 May 2009 08:50:48 pm R. David Murray wrote: > >> In py3k Eric Smith and Mark Dickinson have implemented Gay's floating >> point algorithm for Python so that the shortest repr that will round >> trip correctly is what is used as the floating point repr.... > > Little question: what was the goal of such a change? (is there a pep for me > to > read?) Shouldn't str() do that, and leave repr as is?
It's a good question. I was prepared to write a PEP if necessary, but there was essentially no opposition to this change either in the python-dev thread that Ned already mentioned, in the bugs.python.org feature request (see http://bugs.python.org/issue1580; set aside half-an-hour or so if you want to read this one) or amongst the people we spoke to at PyCon 2009, so in the end Eric and I just went ahead and merged the changes. It didn't harm that Guido supported the idea. I think the main goal was to see fewer complaints from newbie users about 0.1 displaying as 0.10000000000000001. There's no real reason to produce 17 digits here. Neither 0.1 nor 0.10000000000000001 displays the true value of the float---both are approximations, so why not pick the approximation that actually displays nicely. The only requirement is that float(repr(x)) recovers x exactly, and since 0.1 produced the float in the first place, it's clear that taking repr(0.1) to be '0.1' satisfies this requirement. The problem is particularly acute with the use of the round function, where newbies complain that round is buggy because it's not rounding to 2 decimal places: >>> round(2.45311, 2) 2.4500000000000002 With the new float repr, the result of rounding a float to 2 decimal places will always display with at most 2 places after the point. (Well, possibly except when that float is very large.) Of course, there are still going to be complaints that the following is rounding in the wrong direction: >>> round(0.075, 2) 0.07 I'll admit to feeling a bit uncomfortable about the fact that the new repr goes a little bit further towards hiding floating-point difficulties from numerically-naive users. The main things that I like about the new representation is that its definition is saner (give me the shortest string that rounds correctly, versus format to 17 places and then somewhat arbitrarily strip all trailing zeros) and it's more consistent than the old. With the current 2.6/3.0 repr (on my machine; your results may vary): >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.029999999999999999 >>> 0.04 0.040000000000000001 With Python 3.1: >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.03 >>> 0.04 0.04 A cynical response would be to say that the Python 2.6 repr lies only some of the time; with Python 3.1 it lies *all* of the time. But actually all of the above outputs are lies; it's just that the second set of lies is more consistent and better looking. There are also a number of significant 'hidden' benefits to using David Gay's code instead of the system C library's functions, though those benefits are mostly independent of the choice to use the short float repr: - the float repr is much more likely to be consistent across platforms (or at least across those platforms using IEEE 754 doubles, which seems to be 99.9% percent of them) - the C library double<->string conversion functions are buggy on many platforms (including at least OS X, Windows and some flavours of Linux). While I won't claim that Gay's code (or our adaptation of it) is bug-free, I don't know of any bugs (reports welcome!) and at least when bugs are discovered it's within our power to fix them. Here's one example of an x == eval(repr(x)) failure due to a bug in the OS X implementation of strtod: >>> x = (2**52-1)*2.**(-1074) >>> x 2.2250738585072009e-308 >>> y = eval(repr(x)) >>> y 2.2250738585072014e-308 >>> x == y False - similar to the last point: on many platforms string formatting is not correctly rounded, in the sense that e.g. '%.6f' % x does not necessarily produce the closest decimal with 6 places after the decimal point to x. This is *not* a platform bug, since there's no requirement of correct rounding in the C standards. However, David Gay's code does provide correctly rounded string -> double and double -> string conversions, so Python's string formatting will now always be correctly rounded. A small thing, but it's nice to have. - since both round() and string formatting now both use Gay's code, we can finally guarantee that round and string formatting give equivalent results: e.g., that the digits in round(x, 2) are the same as the digits in '%.2f' % x. That wasn't true before: round could round up while '%.2f' % x rounded down (or vice versa) leading to confusion and at least one semi-bogus bug report. - a lot of internal cleanup has become possible as a result of not having to worry about all the crazy things that platform string <-> double conversions can do. This makes the CPython code smaller, clearer, easier to maintain, and less likely to contain bugs. > While I agree that the change gets rid of the weekly newbie question > about "python's lack precision", I'd find more difficult to explain why > 0.2 * 3 != 0.6 without showing them what 0.2 /really/ means. There are still plenty of ways to show what 0.2 really means. My favourite is to use the Decimal.from_float method: >>> Decimal.from_float(0.2) Decimal('0.200000000000000011102230246251565404236316680908203125') This is only available in 2.7 and 3.1, but then the repr change isn't happening until 3.1 (and it almost certainly won't be backported to 2.7, by the way), so that's okay. But there's also float.hex, float.as_integer_ratio, and Fraction.from_float to show the exact value that's stored for a float. >>> 0.2.hex() '0x1.999999999999ap-3' >>> Fraction.from_float(0.2) Fraction(3602879701896397, 18014398509481984) Hmm. That was a slightly unfortunate choice of example: the hex form of 0.2 looks uncomfortably similar to 1.9999999.... An interesting cross-base accident. This is getting rather long. Perhaps I should put the above comments together into a 'post-PEP' document. Mark -- http://mail.python.org/mailman/listinfo/python-list