Dear R-Users,

I would like to extract a branch (sub-tree) from an existing tree (dendrogram) 
which fulfils the following conditions:
- it includes a specified leaf;
- it has a minimum number of leafs, but more than a specified number n;

In other words, I want to extract the n most similar leaves to a given leaf.

Does anyone know some package that has this functionality?

I have some working code, but it is a quick hack and not very robust. Before 
investing more time in it, maybe there is already such functionality.

I looked through the Cluster TaskView and also explored the dendextend and ape 
packages (and a few more); but I did not spot such functionality.
https://cran.r-project.org/web/views/Cluster.html

My current code is on GitHub (see link below).

An example would look like this:

data(iris)

irisClust = iris[,-5]
d = dist(irisClust, method = "euclidean")
x = hclust(d, method="ward.D")
x$labels = paste0("L", 1:nrow(irisClust))

# 1  = Must contain leaf 1;
# 20 = Must cover at least 20 leaves;
tmp = subtree.nc(1, 20, x);
plot(tmp)

The function subtree.nc (and the dependencies count.nodes, subtree.nn and 
order.tree) are in the specified file on GitHub; the code is a little bit long 
for this post. All functions in the file are actually independent of other 
files/modules.

# GitHub:
https://github.com/discoleo/PeptideClassifier/blob/main/R/Helper.Tree.R

There are a few pre-computed moderate-size trees also on GitHub (for more 
realistic exploration):
https://github.com/discoleo/PeptideClassifier/tree/main/inst/examples

Many thanks in advance for any useful pointers.

Sincerely,

Leonard

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to