Sampling error? Do you realize how large a sample size you would need to precisely estimate an 8000 x 8000 covariance matrix? Probably exceeds the number of stars in our galaxy...
Numerical issues may also play a role, but I am too ignorant on this aspect to offer advice. Finally, this is really not an R question, so you would probably do better to post on a stats site like stats.stackexchange.com rather than here. -- Bert On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <bryne...@gmail.com> wrote: > Hi, > > I want to simulate a data set with similar covariance structure as my > observed data, and have calculated a covariance matrix (dimensions > 8368*8368). So far I've tried two approaches to simulating data: > rmvnorm from the mvtnorm package, and by using the Cholesky > decomposition > (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/). > The problem is that the resulting covariance structure in my simulated > data is very different from the original supplied covariance vector. > Lets just look at some of the values: > >> cov8[1:4,1:4] # covariance of simulated data > X1 X2 X3 X4 > X1 34515296.00 99956.69 369538.1 1749086.6 > X2 99956.69 34515296.00 2145289.9 -624961.1 > X3 369538.08 2145289.93 34515296.0 -163716.5 > X4 1749086.62 -624961.09 -163716.5 34515296.0 >> CEUcovar[1:4,1:4] > [,1] [,2] [,3] [,4] > [1,] 0.1873402987 0.001837229 0.0009009272 0.010324521 > [2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035 > [3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395 > [4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488 > > So the distribution of the observed covariance is very narrow compared > to the simulated data. > > None of the eigenvalues of the observed covariance matrix are > negative, and it appears to be a positive definite matrix. Here is > what I did to create the simulated data: > > Chol <- chol(CEUcovar) > Z <- matrix(rnorm(20351 * 8368), 8368) > X <- t(Chol) %*% Z > sample8 <- data.frame(as.matrix(t(X))) >> dim(sample8) > [1] 20351 8368 > cov8=cov(sample8,method='spearman') > > [earlier I've also tried sample8 <- rmvnorm(1000, > mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as > 'bad' results, much larger covariance values in the simulated data ] > > Any ideas of WHY the simulated data have such a different covariance? > Any experience with similar issues? Would be happy to supply the > covariance matrix if anyone wants to give it a try. > Any suggestions? Anything apparent that I left our or neglected? > > Any advice would be highly appreciated. > Best, > Bo > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.