Hi, thanks for the reply. I am not assuming that the supplied covariance vector in any way captures the 'true' covariance matrix of the population, but thats not what I am after either. I just want to simulate data that has a similar covariance as that covariance matrix. And the numbers are so hugely different! Could sampling error cause that? Could the covariance structure be too complicated to simulate?
And yes, I should probably post this on a stat-list, true. Thanks, Bo 2012/8/11 Bert Gunter <gunter.ber...@gene.com>: > Sampling error? Do you realize how large a sample size you would > need to precisely estimate an 8000 x 8000 covariance matrix? Probably > exceeds the number of stars in our galaxy... > > Numerical issues may also play a role, but I am too ignorant on this > aspect to offer advice. > > Finally, this is really not an R question, so you would probably do > better to post on a stats site like stats.stackexchange.com rather > than here. > > -- Bert > > On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <bryne...@gmail.com> wrote: >> Hi, >> >> I want to simulate a data set with similar covariance structure as my >> observed data, and have calculated a covariance matrix (dimensions >> 8368*8368). So far I've tried two approaches to simulating data: >> rmvnorm from the mvtnorm package, and by using the Cholesky >> decomposition >> (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/). >> The problem is that the resulting covariance structure in my simulated >> data is very different from the original supplied covariance vector. >> Lets just look at some of the values: >> >>> cov8[1:4,1:4] # covariance of simulated data >> X1 X2 X3 X4 >> X1 34515296.00 99956.69 369538.1 1749086.6 >> X2 99956.69 34515296.00 2145289.9 -624961.1 >> X3 369538.08 2145289.93 34515296.0 -163716.5 >> X4 1749086.62 -624961.09 -163716.5 34515296.0 >>> CEUcovar[1:4,1:4] >> [,1] [,2] [,3] [,4] >> [1,] 0.1873402987 0.001837229 0.0009009272 0.010324521 >> [2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035 >> [3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395 >> [4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488 >> >> So the distribution of the observed covariance is very narrow compared >> to the simulated data. >> >> None of the eigenvalues of the observed covariance matrix are >> negative, and it appears to be a positive definite matrix. Here is >> what I did to create the simulated data: >> >> Chol <- chol(CEUcovar) >> Z <- matrix(rnorm(20351 * 8368), 8368) >> X <- t(Chol) %*% Z >> sample8 <- data.frame(as.matrix(t(X))) >>> dim(sample8) >> [1] 20351 8368 >> cov8=cov(sample8,method='spearman') >> >> [earlier I've also tried sample8 <- rmvnorm(1000, >> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as >> 'bad' results, much larger covariance values in the simulated data ] >> >> Any ideas of WHY the simulated data have such a different covariance? >> Any experience with similar issues? Would be happy to supply the >> covariance matrix if anyone wants to give it a try. >> Any suggestions? Anything apparent that I left our or neglected? >> >> Any advice would be highly appreciated. >> Best, >> Bo >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.