I did the following calculation: Generated a list of a million random numbers between 0 and 1, constructed a new list by subtracting the mean value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly because of rounding errors.

However, I noticed that the simple Python program below gives a result of ~ 10^-14, while an equivalent Mathematica program (also using double precision) gives a result of ~ 10^-17, i.e. three orders of magnitude more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation" algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
    s = 0.0
    c = 0.0
    for x in arr:
        y = x-c
        t = s+y
        c = (t-s) - y
        s = t
    return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)


I thought that it would be very nice if the built-in sum() function used this algorithm by default. Has this been brought up before? Would this have any disadvantages (apart from a slight performance impact, but Python is a high-level language anyway ...)?

Szabolcs Horvát
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to