Dear R-Users, I have another question regarding trees (dendrograms).
After exploring the various hierarchical clustering methods, it seems that some of the methods (average, single, median) add sequentially very small clusters (even 1 leaf) to an increasingly larger branch. I would like to quantify this more rigorously. I do not think that banner plots fully capture this fact, as they are limited to height of the node where a leaf binds. I came up with 2 alternative measures: - Ratio of leaves on 1 branch (larger branch) vs the other branch (see function branch.ratios); - Size of other branch of the node where 1 leaf binds; The latter resembles the bannerplot; and is also limited only to nodes with leaves. Can anyone point me to such indexes in the literature and/or in other R packages? I am not an expert in the field. Searching for cluster indexes will likely generate a huge number of false positive results (i.e. indexes for number of clusters). An example of this functionality is given below: # Pre-computed Trees: x1 = readRDS("Tree.Full.M_ward.D.rds") x2 = readRDS("Tree.Full.M_average.rds") br1 = branch.ratios(x1) br2 = branch.ratios(x2) # Alternative: size.leafBranch(x1); par.old = par(mfrow = c(1,2)) hist(br1); # Branch Ratio goes up to 1300! hist(br2); par(par.old) # Note: Median & centroid are even more extreme! The data sets and functions are on GitHub: https://github.com/discoleo/PeptideClassifier/tree/main/inst/examples Functions: branch.ratios, size.leafBranch, count.nodes; https://github.com/discoleo/PeptideClassifier/blob/main/R/Helper.Tree.R I have attached an image to this mail with all 8 histograms. The image is also available on GitHub: https://github.com/discoleo/PeptideClassifier/blob/main/Trees.BranchRatios.png Many thanks in advance, Leonard
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.