Hi Hadley, I wrapped the code into a function. I made it so all the lines would always start from the cluster mean. And I tried to give more meaning to the colors by giving the color according the the order of the first principal component of that observation.
What do you think ? Tal # ------------------------------- clustergram <- function(Data, k.range = 2:10 , clustering.function = kmeans, line.width = .004, add.center.points = T) { n <- dim(Data)[1] PCA.1 <- Data %*% princomp(Data)$loadings[,1] # first principal component of our data COL <- heat_hcl(n)[order(PCA.1)] # line colors line.width <- rep(line.width, n) Y <- NULL # Y matrix X <- NULL # X matrix plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1), xlab = "Number of clusters (k)", ylab = "Mean of the first principal component by clusters", main = "Clustergram of first principal component mean by k-mean clusters") axis(side =1, at = k.range) abline(v = k.range, col = "grey") centers.points <- list() for(k in k.range) { cl <- clustering.function(Data, k) clusters.vec <- cl$cluster # the.centers <- apply(cl$centers,1, mean) the.centers <- cl$centers %*% princomp(Data)$loadings[,1] noise <- unlist(tapply(line.width, clusters.vec, cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])] # noise <- noise - mean(range(noise)) y <- the.centers[clusters.vec] + noise Y <- cbind(Y, y) x <- rep(k, length(y)) X <- cbind(X, x) centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k)) # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5) } require(colorspace) matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5) if(add.center.points) { # add points suppressMessages(lapply(centers.points, function(xx) { with(xx,points(y~x, pch = 19, col = "red", cex = 1.3)) return(1) })) } } set.seed(250) Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points = T) ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <had...@rice.edu> wrote: > > The glitches are the cases where you would have a bundle of lines > belonging > > to a specific cluster, but had spaces between them (because the place of > one > > of the lines was saved for another line that in the meantime moved to > > another cluster). > > I think that display looked just fine! > > > I just came up with a solution for how to resolve this (After showering, > it > > tends to help my thinking...) - it is attached at the bottom of this > e-mail. > > I will later cleanup the code a bit and publish it. > > I'd also suggest reordering the lines within each cluster mean so that > (e.g.) all the lines going from 1a to 2a are all in the same position > (i.e. at the top of the bundle of lines, not interspersed throughout). > > And again, think about using the colour for something useful, maybe > the value of the variable that you're averaging over to get the y > position. > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.