Hi, i want to pick up your last method and add that "about 10% of r_ij are greater than 0.9" could mean for example that there is a cluster of about sqrt(1000) variables that are highly correlated.
But first let us note, that there is no canonical meaning for the randomness of a random correlation matrix, so whether this is adequate depends on the underlying problem/simulation study/test cases. Instead of drawing random vectors via rnorm, one could draw for example 42 vectors from a multivariate normal distribution with high correlation in the first 42 components: require("Matrix") require("mvtnorm") n <- 100 n1 <- 42 n2 <- n - n1 R <- bdiag(matrix(0.99,nrow=n1,ncol=n1)+diag(n1)*0.01,diag(n2)*0.01) R <- as.matrix(R) random.correlation.matrix <- function() { m1 <- rmvnorm(n1,rep(0,n),R) m2 <- rmvnorm(n2,rep(0,n),diag(n)) m <- rbind(m1,m2) for (i in 1:n) { # normalization - we want a correlation matrix m[i,] <- m[i,]/sqrt(t(m[i,])%*%m[i,]) } m%*%t(m) } Let us count how many entries have an absolute value greater than 0.9: count.gt.90 <- function() { m <- random.correlation.matrix() sum(abs(m)>0.90) } > mean(replicate(100, count.gt.90())) [1] 999.32 hist(replicate(100, count.gt.90())) Looks well... But remember that you now have every time 42 highly correlated variables (which are also always the first 42 ones). Perhaps you want to make this number also random. And perhaps you want two or three or a random number of different clusters of highly correlated variables? Depends all on your objective - have fun! :-) Cheers, Kornelius. > Yes, Kingsford. This problem does not appear to be trival. I am not sure if > there is any literature on this. I have seen a paper by Davies and Higham > (BIT 2000) on "Stable generation of correlation matrices. There is also a > paper by Harry Joe on generating correlation matrices with given partial > correlation structure. I am not sure if these papers would be helpful for > this problem. Of course, one can readily cook up an ad-hoc, trial-error > procedure to generate the type of matrices that Ning wants. > > By the way, the easiest way to generate a PD matrix is: > > N <- 10 > amat <- matrix(rnorm(N*N), N, N) > A <- amat %*% t(amat) # A is PD > > Best, > Ravi > > ____________________________________________________________________ > > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology > School of Medicine > Johns Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu > > > ----- Original Message ----- > From: Kingsford Jones <kingsfordjo...@gmail.com> > Date: Saturday, August 29, 2009 7:43 pm > Subject: Re: [R] how to generate a random correlation matrix with restrictions > To: Ravi Varadhan <rvarad...@jhmi.edu> > Cc: Ning Ma <pnin...@gmail.com>, r-help@r-project.org > > > > Thanks Ravi -- with my limited linear algebra skills I was assuming > > invertible symmetric was sufficient (rather than just necessary) for > > positive definiteness. So, the open challenge is to generate a pd > > matrix of dimension 100 with r_ij = 1 if i=j else -1< r_ij< 1, with > > about 10% of the elements >.9. > > > > I tried for a bit without luck, but I can offer another way to > > generate a pd matrix: > > > > ev <- runif(100) # choose some positive eigenvalues > > U <- svd(matrix(runif(10000), nc=100))$u # an orthogonal matrix > > pdM <- U %*% diag(ev) %*% t(U) > > > > > > Kingsford > > > > > > > > On Fri, Aug 28, 2009 at 9:41 PM, Ravi Varadhan<rvarad...@jhmi.edu> wrote: > > > Hi Kingsford, > > > > > > There is more structure to a correlation matrix than that meets the > > eye! Your method will produce a matrix R that looks "like" a > > correlation matrix, but beware - it is an impostor! > > > > > > You can obtain a valid correlation matrix, Q, from the impostor R > > by using the `nearPD' function in the "Matrix" package, which finds > > the positive definite matrix Q that is "nearest" to R. However, note > > that when R is far from a positive-definite matrix, this step may give > > a Q that does not have the desired property. > > > > > > require(Matrix) > > > > > > R <- matrix(runif(16), ncol=4) > > > > > > R <- (R * lower.tri(R)) + t(R * lower.tri(R)) > > > > > > diag(R) <- 1 > > > > > > eigen(R)$val > > > > > > Q <- nearPD(R, posd.tol=1.e-04)$mat > > > > > > eigen(Q)$val > > > > > > max(abs(Q - R)) # maximum discrepancy between R and Q > > > > > > Another easy way to produce a valid correlation matrix is: > > > > > > R <- matrix(runif(36), ncol=6) > > > > > > RtR <- R %*% t(R) > > > > > > Q <- cov2cor(RtR) > > > > > > But this does not have the property that the correlations are > > uniformly distributed. > > > > > > Hope this helps, > > > Ravi. > > > > > > ____________________________________________________________________ > > > > > > Ravi Varadhan, Ph.D. > > > Assistant Professor, > > > Division of Geriatric Medicine and Gerontology > > > School of Medicine > > > Johns Hopkins University > > > > > > Ph. (410) 502-2619 > > > email: rvarad...@jhmi.edu > > > > > > > > > ----- Original Message ----- > > > From: Kingsford Jones <kingsfordjo...@gmail.com> > > > Date: Friday, August 28, 2009 10:12 pm > > > Subject: Re: [R] how to generate a random correlation matrix with > > restrictions > > > To: Ning Ma <pnin...@gmail.com> > > > Cc: r-help@r-project.org > > > > > > > > >> Ahh -- Mark Leeds just pointed out off-list that it was supposed > > to be > > >> a correlation matrix. > > >> > > >> Perhaps > > >> > > >> R <- matrix(runif(10000), ncol=100) > > >> R <- (R * lower.tri(R)) + t(R * lower.tri(R)) > > >> diag(R) <- 1 > > >> > > >> These are of course uniformly distributed positive correlations. > > >> > > >> Kingsford > > >> > > >> > > >> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford > > >> Jones<kingsfordjo...@gmail.com> wrote: > > >> > R <- matrix(runif(10000), ncol=100) > > >> > > > >> > hth, > > >> > > > >> > Kingsford Jones > > >> > > > >> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma<pnin...@gmail.com> wrote: > > >> >> Hi, > > >> >> > > >> >> How can I generate a random 100x100 correlation matrix, R={r_ij}, > > >> >> where about 10% of r_ij are greater than 0.9 > > >> >> > > >> >> Thanks in advance. > > >> >> > > >> >> ______________________________________________ > > >> >> R-help@r-project.org mailing list > > >> >> > > >> >> PLEASE do read the posting guide > > >> >> and provide commented, minimal, self-contained, reproducible code. > > >> >> > > >> > > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list > > >> > > >> PLEASE do read the posting guide > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.