Thanks Mat,
I have in the meantime identified the Rand index, but not the others. I
will also have a look at profdpm, that did not pop-up in my searches.
Indeed, the interpretation is going to be critical... Could you please
elaborate on what you mean by the bootstrap process?
Thanks a lot for your helps,
Martin
On 11/17/2010 3:50 PM, Matt Shotwell wrote:
There are several statistics used to compare nominal classifications, or
_partitions_ of a data set. A partition isn't quite the same in this
context because partitioned data are not restricted to a fixed number of
classes. However, the statistics used to compare partitions should also
work for these 'restricted' partitions. See the Rand index, Fowlkes and
Mallows index, Wallace indices, and the Jaccard index. The profdpm
package implements a function (?profdpm::pci) that computes these
indices for two factors representing partitions of the same data.
The difficult part is drawing statistical inference about these indices.
It's difficult to formulate a null hypothesis, and even more difficult
to determine a null distribution for a partition comparison index. A
bootstrap test might work, but you will probably have to implement this
yourself.
-Matt
On Wed, 2010-11-17 at 08:33 -0500, Martin Tomko wrote:
Dear all,
I am having a hard time to figure out a suitable test for the match
between two nominal classifications of the same set of data.
I have used hierarchical clustering with multiple methods (ward,
k-means,...) to classify my dat into a set number of classesa, and I
would like to compare the resulting automated classification with the
actual - objective benchmark one.
So in principle I have a data frame with n columns of nominal
classifications, and I want to do a mutual comparison and test for
significance in difference in classification between pairs of columns.
I just need to identify a suitable test, but I fail. I am currently
exploring the possibility of using Cohen's Kappa, but I am open to other
suggestions. Especially the fact that kappa seems to be moslty used on
failible, human annotators seems to bring in limitations taht do not
apply to my automatic classification.
Any help will be appreciated, especially if also followed by a pointer
to an R package that implements it.
Thanks
Martin
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Martin Tomko
Postdoctoral Research Assistant
Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland
email: martin.to...@geo.uzh.ch
site: http://www.geo.uzh.ch/~mtomko
mob: +41-788 629 558
tel: +41-44-6355256
fax: +41-44-6356848
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.