Hi Hadley,

I wrapped the code into a function.
I made it so all the lines would always start from the cluster mean.
And I tried to give more meaning to the colors by giving the
color according the the order of the first principal component of that
observation.

What do you think ?

Tal




# -------------------------------


clustergram <- function(Data, k.range = 2:10 ,
 clustering.function = kmeans,
line.width = .004, add.center.points = T)
{
n <- dim(Data)[1]
 PCA.1 <- Data %*% princomp(Data)$loadings[,1] # first principal component
of our data


COL <- heat_hcl(n)[order(PCA.1)] # line colors

 line.width <- rep(line.width, n)
 Y <- NULL # Y matrix
 X <- NULL # X matrix

plot(0,0, col = "white", xlim = range(k.range), ylim = range(PCA.1),
xlab = "Number of clusters (k)", ylab = "Mean of the first principal
component by clusters", main = "Clustergram of first principal component
mean by k-mean clusters")
 axis(side =1, at = k.range)
abline(v = k.range, col = "grey")

 centers.points <- list()

for(k in k.range)
{
 cl <- clustering.function(Data, k)
 clusters.vec <- cl$cluster
 # the.centers <- apply(cl$centers,1, mean)
the.centers <- cl$centers %*% princomp(Data)$loadings[,1]

noise <- unlist(tapply(line.width, clusters.vec,
cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
 # noise <- noise - mean(range(noise))
y <- the.centers[clusters.vec] + noise
 Y <- cbind(Y, y)
x <- rep(k, length(y))
X <- cbind(X, x)

centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))
 # points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
}

require(colorspace)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)

if(add.center.points)
{
# add points
 suppressMessages(lapply(centers.points, function(xx) {
with(xx,points(y~x, pch = 19, col = "red", cex = 1.3))
 return(1)
}))
}

}


set.seed(250)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
clustergram(Data, k.range = 2:8 , line.width = .004, add.center.points = T)







----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Tue, Jun 15, 2010 at 4:46 PM, Hadley Wickham <had...@rice.edu> wrote:

> > The glitches are the cases where you would have a bundle of lines
> belonging
> > to a specific cluster, but had spaces between them (because the place of
> one
> > of the lines was saved for another line that in the meantime moved to
> > another cluster).
>
> I think that display looked just fine!
>
> > I just came up with a solution for how to resolve this (After showering,
> it
> > tends to help my thinking...) - it is attached at the bottom of this
> e-mail.
> > I will later cleanup the code a bit and publish it.
>
> I'd also suggest reordering the lines within each cluster mean so that
> (e.g.) all the lines going from 1a to 2a are all in the same position
> (i.e. at the top of the bundle of lines, not interspersed throughout).
>
> And again, think about using the colour for something useful, maybe
> the value of the variable that you're averaging over to get the y
> position.
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to