[issue32956] python 3 round bug

Mark Dickinson Thu, 27 Sep 2018 11:48:42 -0700


Mark Dickinson <dicki...@gmail.com> added the comment:


[Joshua]

> 1. Update the round() docs to make the documentation of this behavior less 
> buried,

Sounds reasonable to me; I'm definitely open to documentation improvements. 
Though it doesn't seem all that buried to me: the round-ties-to-even behaviour 
is described in the third sentence in the first place I'd look for round 
documentation (https://docs.python.org/3/library/functions.html#round). It 
would be misleading to move the information earlier, because the use of 
round-ties-to-even is specific to the builtin types: user-defined types can do 
whatever they like via the __round__ magic method.

> 2. include a (brief) justification (possibly even just a link to 
> http://wiki.c2.com/?BankersRounding or some more-authoritative document), and

Sure, a link to a source on bankers rounding could work.

> 3. link to where else this change in Python 3 was discussed more, if 
> anywhere, or else confirm this change was made based on no additional 
> analysis that we can find written down.

I'm not aware of much discussion beyond the thread that Serhiy already pointed 
to. There's a little bit more (but not much) on rounding the py3k mailing list 
(try a Google search for "site:mail.python.org/pipermail/python-3000 rounding").

> It'd also be interesting to hear if this is something we wish we'd done 
> differently now, but that shouldn't distract from 1, 2, and 3.

I can't speak for anyone else, but it's certainly not something I think should 
have been done differently, with one caveat: the silent and subtle change in 
behaviour from Python 2 to Python 3 was a bit unpleasant, and a possible source 
of late-discovered (or undiscovered) bugs.

> so maybe changing from round_half_up to round_half_even was necessary for the 
> other improvements [...]

No. The change was independent of other fixes and changes. There _is_ quite a 
history of round changes: fixes for the single-argument round function in odd 
corner cases (earlier versions of Python used the simple add-half-and-chop 
algorithm, with gives the wrong answer for 0.4999999999999999 and for 
4503599627370497.0 thanks to FPU-level rounding in the add-half step); making 
two-argument round correctly-rounded in all cases in Python 2.7 and 3.1 via the 
same dtoa.c machinery used for str<->float conversions; changing the return 
type of single-argument round in Python 3; making round generic via the 
__round__ magic method, etc. But none of these required the change in rounding 
mode.

We need to recognise that there are various different contexts where the idea 
of "rounding" comes into play in a general-purpose language. Some examples:

1. FPU-level rounding for basic floating-point operations (addition, 
multiplication, sqrt, etc.)
2. Conversion of source-code decimal numeric literals (e.g., in "bad_pi = 
3.14") to the _nearest_ exactly representable binary float/double; the notion 
of _nearest_ needs some way to break ties.
3. Formatting a float for output as a string (format(my_float, ".2f"))
4. Rounding a float to the nearest integer (Python's single-argument "round")
5. Rounding a binary float to some number of decimal places (two-argument 
round), which is a rather more subtle operation than it might seem at first 
sight

For 1., there's decades of numerical evidence that round-ties-to-even is what 
you want to do, and that's why IEEE 754 makes it the default rounding mode, and 
why it's the rounding mode you're likely to be using for numeric work out of 
the box in any mainstream language. [For one demonstration of where the 
unbiasedness of round-ties-to-even can matter, see 
https://stackoverflow.com/a/45245802/270986. Apologies for linking to my own 
answer here, but it was easily accessible. I'm sure there are many better 
demonstrations out there.]

Case 2 is really a special case of 1. Though not (usually) FPU-supported: you 
can think of conversion from decimal string to binary floating-point as another 
primitive floating-point operation, and it's one that's covered by IEEE 754; 
round-ties-to-even (or at least, some precision- or algorithm-limited 
_approximation_ to round-ties-to-even) is again a common default across 
languages and operating systems.

Case 3 is also covered by IEEE 754, and I believe that "most" languages use 
round-ties-to-even here, too. C's fprintf (for example) specifies that e-style, 
f-style, and g-style formatting should be "correctly rounded" (C99 
7.19.6.1p13), where "correctly rounded" means "[...] nearest in value, subject 
to the current rounding mode [...]" (C99 3.9); in practice, that's usually 
round-ties-to-even. Java's DecimalFormat uses round-ties-to-even by default 
(source:  
https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html). I 
haven't checked other languages, but I expect that many of them do something 
similar.

Cases 4 and 5 are mostly what we're arguing about in this issue. It's much less 
clear to me that the numerical benefits are significant at this level (compared 
to FPU-level last-bit-rounding, where those benefits are really unarguable). 
But note that these cases are really just floatified versions of case 3. 
Indeed, Python 3's current two-argument round algorithm is based directly on 
the string conversion code used for string formatting. And the use of 
round-ties-to-even for case 3 is already well established (and was already 
established long before Python 3.)

What happens for these 5 cases in Java? It _looks_ to me as though the first 
three cases use round-ties-to-even, the fourth uses round-ties-to-away by 
default, and the last isn't directly supported by the language. (But it's been 
a long time since I dabbled in Java.)

Like I said, I'm not totally convinced about the numerical benefits of 
round-ties-to-even for user-level round-to-n-decimal-places operations as 
opposed to FPU-level rounding (though I'm open to persuasion). That's partly 
because round-to-two-decimal-places (for example) is actually quite a peculiar 
operation to be doing on a binary float in the first place, and in practice 
ties don't really appear or affect the behaviour that often. (It might *look* 
as though you have a value "2.675" in your dataframe, but on a typical machine 
that value is actually being stored as 
"2.67499999999999982236431605997495353221893310546875", so it doesn't matter 
one whit whether you're using round-ties-to-even or round-ties-to-away: under 
correct rounding, both are going to give you the surprising result of 2.67 when 
you round to two decimal places).

What I really like about Python's choice is the consistency. In Python, since 
Python 3, all five cases of rounding described above use round-ties-to-even. In 
Python 2, float formatting used round-ties-to-even (most of the time in 
practice, though for Python 2.6 and earlier the exact behaviour depended on the 
system), while "round" used round-ties-to-away for a very closely-related 
operation, and there are bug reports and StackOverflow questions from users 
surprised by the discrepancy between float formatting and two-argument round. 
In Python 3, we have the pleasant situation that "round" and string formatting 
agree.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32956>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue32956] python 3 round bug

Reply via email to