Hi, I am new to clustering in R and I have a dataset with approximately 17,000 rows and 8 columns with each data point a numerical character with three decimal places. I would like to cluster the 8 columns so that I get a dendrogram as an output. So, I am simply creating a distance matrix of my data, using the 'hclust' function, and then plotting the results (see below, my data is contained in the text file).
x<-read.table('SEP_IR_1113_3.txt', header=TRUE,sep="\t') x.dist=dist(x) hc=hclust(x.dist,method="average") plot(hc, hang=-1) Unfortunately, the hclust function, although it produces no error terms, takes a very long time to run (>4 hours) and my computer kills the program before it finishes. I don't think this data set is so large to cause such a long computing time, and I have plenty of memory since I am running this analysis on our university computing cluster. Has anyone run into this problem before and does anyone have any tips on how I can speed up processing? I can provide extra information if necessary regarding my problem. Thank you! -- View this message in context: http://old.nabble.com/hclust-too-slow--tp26395774p26395774.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.