Re: [R] Can I compare two clusters without using their distance-matrix (dist()) ?

Christian Hennig Wed, 21 Apr 2010 10:18:22 -0700

Dear Tal,

I took the definition of the Hubert gamma- and Dunn-index from the Gordonbook. They are actually not about comparing two clusters, at least not inthat reference, and they require dissimilarities.

The adjusted Rand index and Meila's VI, as implemented incluster.stats, compare two clusterings. If you set compareonly=TRUE incluster.stats, it only computes these two indexes, so it doesn't need thedissimilarity matrix in principle. I will probably in the next update

change it so that in this case you don't need to provide a
dissimilarity matrix.

Until then, you can supply a noninformative matrix.
Example:
c1 <- sample(4,100,replace=TRUE)
c2 <- sample(5,100,replace=TRUE)
cs <- cluster.stats(d=matrix(0,ncol=100,nrow=100),c1,c2,compareonly=TRUE)

cs$corrected.rand
cs$vi

Hope this helps,
Christian



On Wed, 21 Apr 2010, Tal Galili wrote:

Thanks for the fast reply Uwe.

My hope in posting this was to find if anyone had already done work (in R)
in this direction.  So far I wasn't able to find any such relevant code, so
I turned to the mailing list.

Regarding new implementations - thanks for offering! - I have already came
around one such algorithm - I implemented it, and will probably publish it
on my blog <http://www.r-statistics.com/> in the near future.

If any one else has any reference to R implementation, it would be most
helpful,
Tal


----------------Contact
Details:-------------------------------------------------------
Contact me: [email protected] |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




2010/4/21 Uwe Ligges <[email protected]>

On 21.04.2010 18:15, Tal Galili wrote:

Hello all,

I would like to compare the similarity of two cluster solutions using a
validation criteria (such as Hubert's gamma coefficient, the Dunn index
the
corrected rand index and so on)

I see (from here:http://www.statmethods.net/advstats/cluster.html) that
the function cluster.stats() in the fpc package provides a mechanism
for comparing 2 cluster solutions - *BUT* - it requires me to give the
the distance matrix among objects.

*My question *is: What ways can you suggest for comparing two cluster
solutions, while using the cluster indicators only (i.e: a vector saying
to
which cluster each object belongs to), and WITHOUT asking to submit the
distance matrix between the objects.


Don't know. If you have a theoretical solution and can provide the
description of a method, there will be many people around happy to make an
algorithm and implement it.

Uwe Ligges



 Thanks,

Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: [email protected] |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)

----------------------------------------------------------------------------------------------

       [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[email protected], www.homepages.ucl.ac.uk/~ucakche

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can I compare two clusters without using their distance-matrix (dist()) ?

Reply via email to