I'm a Python newbie and certainly no expert on statistics, but my wife was taking a statistics course this summer and to illustrate that sampling random numbers from a distribution and taking an average of the samples gives you a random number as the result (bigger sample -> smaller variance in the calculated random number, converging in on the mean of the original distribution), I threw together this program:
#! /usr/bin/python import random; i=1 samplen=100 mean=130 lo=mean hi=mean sd=10 sum=0 while(i<=samplen): x=random.normalvariate(mean,sd) #print x if x<lo: lo=x if x>hi: high=x sum+=x i+=1 print 'sample mean=', sum/samplen, '\n' print 'low value =', lo print 'high value=', high --------------------------------------------------------- But the more I run the darn thing, the stranger the results look to me. random.normalvariate is defined on page 89 of http://www-acc.kek.jp/WWW-ACC-exp/KEKB/Control/Python%20Documents/lib.pdf as generating points from a normal distribution with mean and standard deviation given by the arguments. But my test program consistently comes up with sample means that are less than the mean of the distribution. The lo value is consistently much lower relative to the mean than the high value is higher than the mean. That is, it looks to me like the normalvariate function is biased. Part of my being a Python newbie is I'm not really sure where to go to discuss this problem. If this group isn't the right place, do feel free to point me to where I ought to go. I'm running Ubuntu Dapper and "python -V" says I've got Python 2.4.3. I tried looking in random.py down under /usr/lib but find no clues there as to the version of the random module on my machine. Am I missing something? /usr/lib/python2.4$ ls -l random.py -rw-r--r-- 1 root root 30508 2006-10-06 04:34 random.py I added the lo and high stuff to my test program out of fear that I was running into something funky in adding up 100 floating point numbers. That would be more of a worry if the sample size was much bigger, but lo and high showed apparent bias quite aside from the calculation of the mean. Am I committing some other obvious statistical or Python blunder? e.g. Am I mis-understanding what random.normalvariate is supposed to do? -- http://mail.python.org/mailman/listinfo/python-list