[R] Comparing membership of clusters

Paul Lemmens Fri, 27 Nov 2009 01:33:10 -0800

Hello,

I'm taking several physiological measurements on participants (e.g.,
skin conductivity, heart rate, etc.). I know that those participants
belong to one of three groups (from another measurement), and I'm
looking to find the physiological measurement that best describes
group membership. The measurements are taken over several days and I
computed an lm() for each participant for each measurement and used
the regression coefficient as input for a hclust(). After cutree(x,
k=3), I have a matrix with in columns group indices for each
measurement. Now I need to assess which column is most similar to my
gold standard.


Q1: how to easiest and best abstract away from group labeling (because
that's arbitrary, see below)?
Q2: is there a statistic to compute level of similarity (other than tallying)?

So I have (after the cutree)

res <- 
matrix(c(1,1,1,2,1,3,2,1,1,1,2,1,1,3,1,3,3,3,1,2,1,1,1,2,1,3,1,1,2,2,2,1,3,1,2,2,1,
1,1,2,2,3,1,1,1,1,2,1,2,1,2,3,2,1,1,2,3,2,2,1,2,2,1,1,1,1,2,1,3,1,1,2,1,2,
2,1,2,1,2,1,3,1,2,2,3,1,2,1,2,2,1,1,1,1,1,2,3,3,1,1,1,1,1,1,1,2,3,3,1,1,2,
1,1,3,2,2,2), nrow=9)
colnames(res) <- LETTERS[1:13]

which has the cluster assignments for each measurement in the columns.
My gold standard is

gold <- c(1, 1, 2, 2, 3, 3, 1, 2, 3)

Now for each column in res, I want to see how similar it is to gold.
Note that exact matching on number identity is not correct, because
the gold standard could also be expressed as c(a, a, b, b, c, c, a, b,
c), or even c(3, 3, 1, 1, 2, 2, 3, 1, 2). So the fact that participant
(index) 1, 2, and 8 belong to each other is key.

I am most puzzled about how to do the matching / find the similarity
between each column and gold standard.


Thank you for your time!
best regards,
Paul Lemmens

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparing membership of clusters

Reply via email to