Dear R user, I am a biochemist/bioinformatician, at the moment working on protein clusterings by conformation similarity.
I only started seriously working with R about a couple of months ago. I have been able so far to read my way through tutorials and set-up my hierarchical clusterings. My problem is that I cannot find a way to obtain information on the rooting of specific nodes, i.e. of specific clusters of interest. In other words, I am trying to obtain/read the sub-clusters of a specific cluster in the dendrogram, by isolating a specific node and exploring locally its lower hierarchy. Please allow me to display some of the code I have been using for your reference: df=read.table('mydata.txt', head=T, row.names=1) #read file with distance matrix d=as.dist(df) #format table as distance matrix z<-hclust(d,method="complete", members=NULL) x<-as.dendrogram(z) plot(x, xlab="mydata complete-LINKAGE", ylim=c(0,4)) #visualization of the dendrogram clusters<-cutree(z, h=1.6) #obtain clusters at cutoff height=1.6 ord<-cmdscale(d, k=2) #Multidimensional scaling of the data down to 2 dimensions clusplot(ord,clusters, color=TRUE, shade=TRUE,labels=4, lines=0) #visualization of the clusters in 2D map var1<-var(clusters==1) #variance of cluster 1 #extract cluster memberships: clids = as.data.frame(clusters) names(clids) = c("id") clids$cdr = row.names(clids) row.names(clids) = c(1:dim(clids)[1]) clstructure = lapply(unique(clids$id), function(x){clids[clids$id == x,'cdr']}) clstructure[[1]] #get memberships of cluster 1 >From this point, eventually, I could recreate a distance matrix with only the members of a specific cluster and then re-apply hierarchical clustering and start all over again. But this would take me ages to perform individually for hundred of clusters. So, I was hoping if anyone could point me to a direction as to how to take advantage of the initial dendrogram and focus on specific clusters from which to derive the sub-clusters at a new given cutoff height. I recently found in this page http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual the following code: clid <- c(1,2) ysub <- y[names(mycl[mycl%in%clid]),] hrsub <- hclust(as.dist(1-cor(t(ysub), method="pearson")), method="complete") # Select sub-cluster number (here: clid=c(1,2)) and generate corresponding dendrogram. Even with this given example I am afraid I can't work my way around. So I guess in my case I could grab all the members of a specific cluster using my existing code and try to reformat the distance matrix in one that only contains the distances of those members: cluster1members<-clstructure[[1]] Then I need to reformat the distance matrix into a new one, say d1, which I can feed to a new -local- hierarchical clustering: hrsub<-hclust(d1, method="complete") Any ideas on how I can obtain a new distance matrix with just the distances of the members in that clusters, with names contained in vector "cluster1members" ? Apologies if this seems trivial, but I really can't find the correct functions to use for this task. Thank you very much in advance - as I am really a novice with R, small chunks of code as example would be of great help. Take care all - -- View this message in context: http://r.789695.n4.nabble.com/Advice-on-exploration-of-sub-clusters-in-hierarchical-dendrogram-tp4414277p4414277.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.