Hello, I'm taking several physiological measurements on participants (e.g., skin conductivity, heart rate, etc.). I know that those participants belong to one of three groups (from another measurement), and I'm looking to find the physiological measurement that best describes group membership. The measurements are taken over several days and I computed an lm() for each participant for each measurement and used the regression coefficient as input for a hclust(). After cutree(x, k=3), I have a matrix with in columns group indices for each measurement. Now I need to assess which column is most similar to my gold standard.
Q1: how to easiest and best abstract away from group labeling (because that's arbitrary, see below)? Q2: is there a statistic to compute level of similarity (other than tallying)? So I have (after the cutree) res <- matrix(c(1,1,1,2,1,3,2,1,1,1,2,1,1,3,1,3,3,3,1,2,1,1,1,2,1,3,1,1,2,2,2,1,3,1,2,2,1, 1,1,2,2,3,1,1,1,1,2,1,2,1,2,3,2,1,1,2,3,2,2,1,2,2,1,1,1,1,2,1,3,1,1,2,1,2, 2,1,2,1,2,1,3,1,2,2,3,1,2,1,2,2,1,1,1,1,1,2,3,3,1,1,1,1,1,1,1,2,3,3,1,1,2, 1,1,3,2,2,2), nrow=9) colnames(res) <- LETTERS[1:13] which has the cluster assignments for each measurement in the columns. My gold standard is gold <- c(1, 1, 2, 2, 3, 3, 1, 2, 3) Now for each column in res, I want to see how similar it is to gold. Note that exact matching on number identity is not correct, because the gold standard could also be expressed as c(a, a, b, b, c, c, a, b, c), or even c(3, 3, 1, 1, 2, 2, 3, 1, 2). So the fact that participant (index) 1, 2, and 8 belong to each other is key. I am most puzzled about how to do the matching / find the similarity between each column and gold standard. Thank you for your time! best regards, Paul Lemmens ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.