On Nov 11, 2010, at 4:44 AM, Stefan Evert wrote:

Pasted and realigned from original posting:
    term1 term2 term3 term4 term5
term1 0 2 0 1 3
term2 2 0 0 1 2
term3 0 0 0 0 0
term4 1 1 0 0 1
term5 3 2 0 1 1
Any ideas on how to do that?

If I understood you correctly, you have this matrix of indicator variables for occurrences of terms in documents:

A <- matrix(c(1,1,0,0,1,1,1,0,1,1,1,0,0,0,1), nrow=3, byrow=TRUE, dimnames=list(paste("doc",1:3), paste("term",1:5)))
 A

and want to determine co-occurrence counts for pairs of terms, right? (The formatting of your matrices was messed up, and some of your co-occurrence counts don't make sense to me.)

The fastest and easiest solution is

 t(A) %*% A

That is really elegant. (Wish I could remember my linear algebra lessons as well from forty years ago.) I checked it against the specified output and found that with one exception that the OP had planned for the diagonal to be filled with zeroes. So that could be completed by a simple modification:

temp <- t(A) %*% A
diag(temp) <- 0
temp

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to