Re: numpy/scipy: correlation

robert Sun, 12 Nov 2006 10:41:00 -0800

sturlamolden wrote:
> First, are you talking about rounding error (due to floating point
> arithmetics) or statistical sampling error?


About measured data. rounding and sampling errors with special distrutions are 
neglegible. Thus by default assuming gaussian noise in x and y. 
(This may explain that factor of ~0.7 in the rectangle M.C. test)
The (x,y) points may not distribute "nicely" along the assumed regression 
diagonale.

> If you are talking about the latter, I suggest you look it up in a
> statistics text book. E.g. if x and y are normally distributed, then
> 
> t = r * sqrt( (n-2)/(1-r**2) )
> 
> has a Student t-distribution with n-2 degrees of freedom. And if you
> don't know how to get the p-value from that, you should not be messing
> with statistics anyway.

yet too lazy/practical for digging these things from there. You obviously got 
it - out of that, what would be a final estimate for an error range of r (n 
big) ?   
that same "const. * (1-r**2)/sqrt(n)" which I found in that other document ?

The const. ~1 is less the problem. 

My main concern is, how to respect the fact, that the (x,y) points may not 
distribute well along the regression line. E.g. due to the nature of the 
experiment more points are around (0,0) but only few distribute along the 
interesting part of the diagonale and thus few points have great effect on m & 
r. above formulas will possibly not respect that.
I could try a weighting technique, but maybe there is a (commonly used) speedy 
formula for r/r_err respecting that directly?


Robert
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: numpy/scipy: correlation

Reply via email to