While I am at it, lets add the bootstrap estimate of the standard error as well.
from numpy import mean, std, sum, sqrt, sort, corrcoef, tanh, arctanh from numpy.random import randint def bootstrap_correlation(x,y): idx = randint(len(x),size=(1000,len(x))) bx = x[idx] by = y[idx] mx = mean(bx,1) my = mean(by,1) sx = std(bx,1) sy = std(by,1) r = sort(sum( (bx - mx.repeat(len(x),0).reshape(bx.shape)) * (by - my.repeat(len(y),0).reshape(by.shape)), 1) / ((len(x)-1)*sx*sy)) #bootstrap confidence interval (NB! biased) conf_interval = (r[25],r[975]) #bootstrap standard error using Fisher's z-transform (NB! biased) std_err = tanh(std(arctanh(r))*(len(r)/(len(r)-1.0))) return (std_err, conf_interval) Since we are not using bias corrected and accelerated bootstrap, the standard error are more meaningful, although both estimates are biased. I said that the boostrap is "distribution free". That is not quite the case either. The assumed distribution is the provided sample distribution, i.e. a sum of Dirac delta functions. Often this is not a very sound assumption, and the bootstrap must then be improved e.g. using the BCA procedure. That is particularly important for confidence intervals, but not so much for standard errors. However for a quick-and-dirty standard error estimate, we can just ignore the small bias. Now, if you wonder why I used the standard deviation to obtain the standard error (without dividing by sqrt(n) ), the answer is that the standard error is the standard deviation of an estimator. As we have 1000 samples from the correlation's sampling distribution in r, we get the "standard error of the correlation" by taking the standard deviation of r. The z-transform is required before we can compute mean and standard deviations of correlations. -- http://mail.python.org/mailman/listinfo/python-list