I have been attempting to do some work using hclust, and have run
into a (possibly subtle) problem.

 The background is that I constructed a dissimilarity matrix ``d1''
(it involved something called the ``Jaccard similarity coefficient''; I won't go
into the details unless requested).  I then did

        d2 <- as.dist(d1)
        try <- hclust(d2,method=ward)
        plot(try,labels=FALSE)

After looking at the plot, I tried

        mmm <- cutree(try,h=7)

and got the error message

Error in cutree(try, h = 7) :
  the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first

I was much puzzled by this initially, since try is already an ``hclust'' object (I checked class(try)) but after a substantial amount of hair-tearing I discovered that the entries of the height component of try are constant over long stretches. E.g. the first 54 entries are 0 (to the 7 printed decimal places). This doesn't *seem* to be cause for alarm --- the help says explicitly that height is a *non-decreasing* sequence (but not necessarily a strictly increasing one).

I checked

        with(try,all.equal(height,sort(height))

and got

[1] TRUE

but order(try$height) is NOT equal to 1:745 (note that 746 is the number of subjects
in the data set).

I have done an RSiteSearch() on "cutree" and turned up nothing that seemed relevant.

Finally, I found that if I do

        try$height <- round(try$height,6)
then

        mmm <- cutree(try,h=7)

``works'' (without error).

Are there traps for young players in employing such a strategy? What should I
really worry about?

If anyone wants to try it for themselves with the real distance matrix, I can bundle
it up and email it to them privately.

Thanks for any insights.

        cheers,

                Rolf Turner


######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to