New submission from Oscar Benjamin:

The mean/variance functions in the statistics module don't quite round 
correctly.

The reasons for this are that although exact rational arithmetic is used 
internally in the _sum function it is not used throughout the module. In 
particular the _sum function should be changed to return an exact result and 
exact arithmetic should be used right up to the point before returning to the 
user at which point a rounding/coercion should be used to give the user their 
answer in the appropriate type correctly rounded once.

Using exact arithmetic everywhere makes it possible to replace all of the 
variance* functions with single-pass algorithms based on the computational 
formula for variance which should be more efficient as well.

For example the following implements pvariance so that it returns a perfectly 
rounded result for float input and output in a single pass:

def pvariance(data):
    sx = 0
    sx2 = 0
    for n, x in enumerate(map(Fraction, data), 1):
        sx += x
        sx2 += x ** 2
    Ex = sx / n
    Ex2 = sx2 / n
    var = Ex2 - Ex ** 2
    return float(var)

Comparing the above with the statistics module:

>>> pvariance([0, 0, 1])
0.2222222222222222
>>> statistics.pvariance([0, 0, 1])
0.22222222222222224

The true answer is:

>>> from fractions import Fraction as F
>>> float(statistics.pvariance([F(0), F(0), F(1)]))
0.2222222222222222

The logic in the _sum function for computing exact integer ratios and coercing 
back to the output type could be moved into utility functions so that it does 
not need to be duplicated.

Some examples of rounding issues:

>>> from statistics import variance, mean
>>> from decimal import Decimal as D, getcontext
>>> from fractions import Fraction as F

Variance with ints or floats returns a float but the float is not quite the 
nearest possible float:

>>> variance([0, 0, 2])
1.3333333333333335
>>> float(variance([F(0), F(0), F(2)]))  # true result rounded once
1.3333333333333333

Another example with Decimal:

>>> getcontext().prec = 5
>>> getcontext()
Context(prec=5, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, 
capitals=1, clamp=0, flags=[Rounded, Inexact], traps=[DivisionByZero, Overflow, 
InvalidOperation])

>>> variance([D(0), D(0), D(2)] * 2)  # Rounded down instead of up
Decimal('1.0666')
>>> r = (variance([F(0), F(0), F(2)] * 2))
>>> D(r.numerator) / r.denominator  # Correctly rounded
Decimal('1.0667')

The mean function may also not be correctly rounded:

>>> getcontext().prec = 2
>>> r = mean((F('1.2'), F('1.3'), F('1.55')))
>>> r
Fraction(27, 20)
>>> D(r.numerator) / r.denominator # Correctly rounded
Decimal('1.4')
>>> mean([D('1.2'), D('1.3'), D('1.55')])
Decimal('1.3')

----------
components: Library (Lib)
messages: 210121
nosy: oscarbenjamin, stevenjd
priority: normal
severity: normal
status: open
title: Rounding errors with statistics.variance
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20499>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to