There are several statistics used to compare nominal classifications, or _partitions_ of a data set. A partition isn't quite the same in this context because partitioned data are not restricted to a fixed number of classes. However, the statistics used to compare partitions should also work for these 'restricted' partitions. See the Rand index, Fowlkes and Mallows index, Wallace indices, and the Jaccard index. The profdpm package implements a function (?profdpm::pci) that computes these indices for two factors representing partitions of the same data.
The difficult part is drawing statistical inference about these indices. It's difficult to formulate a null hypothesis, and even more difficult to determine a null distribution for a partition comparison index. A bootstrap test might work, but you will probably have to implement this yourself. -Matt On Wed, 2010-11-17 at 08:33 -0500, Martin Tomko wrote: > Dear all, > I am having a hard time to figure out a suitable test for the match > between two nominal classifications of the same set of data. > I have used hierarchical clustering with multiple methods (ward, > k-means,...) to classify my dat into a set number of classesa, and I > would like to compare the resulting automated classification with the > actual - objective benchmark one. > So in principle I have a data frame with n columns of nominal > classifications, and I want to do a mutual comparison and test for > significance in difference in classification between pairs of columns. > > I just need to identify a suitable test, but I fail. I am currently > exploring the possibility of using Cohen's Kappa, but I am open to other > suggestions. Especially the fact that kappa seems to be moslty used on > failible, human annotators seems to bring in limitations taht do not > apply to my automatic classification. > Any help will be appreciated, especially if also followed by a pointer > to an R package that implements it. > > Thanks > Martin > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.