I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.
The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.
However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.
Here's the program (pardon my style, I'm a newbie/occasional user):
from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)
A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:
def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s
mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?
Szabolcs Horvát
--
http://mail.python.org/mailman/listinfo/python-list