On 16/08/14 01:29, Joshua Wiley wrote:
On Wed, Aug 13, 2014 at 7:41 AM, Rolf Turner <r.tur...@auckland.ac.nz <mailto:r.tur...@auckland.ac.nz>> wrote: On 13/08/14 07:57, Ron Michael wrote: Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. <SNIP> (t(Data_Normalized) %*% Data_Normalized)/dim(Data___Normalized)[1] Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here? You are using a denominator of "n" in calculating your "covariance" matrix for your normalized data. But these data were normalized using the sd() function which (correctly) uses a denominator of n-1 so as to obtain an unbiased estimator of the population standard deviation. As a small point n - 1 is not _quite_ an unbiased estimator of the population SD see Cureton. (1968). Unbiased Estimation of the Standard Deviation, The American Statistician, 22(1). To see this in action: res <- unlist(parLapply(cl, 1:1e7, function(i) sd(rnorm(10, mean = 0, sd = 1)))) correction <- function(n) { gamma((n-1)/2) * sqrt((n-1)/2) / gamma(n/2) } mean(res) # 0.972583 mean(res * correction(10)) # 0.9999216 The calculation for sample variance is an unbiased estimate of the population variance, but square root is a nonlinear function and the square root of an unbiased estimator is not itself necessarily unbiased.
Aaaaarrrggghhh. Yes of course. I *know* that you don't get an unbiased estimate of the sd by using n-1 in the denominator; you get an unbiased estimate of the variance and as you say, sqrt() is a non-linear function .....
I just didn't think carefully enuff before I wrote. Thanks for pulling me up on this error.
cheers, Rolf -- Rolf Turner Technical Editor ANZJS ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.