Ferebee Tunno <ferebee.tunno <at> mathstat.astate.edu> writes: > Hi everyone - > > I know that R is capable of clustering using the k-means algorithm, but can > R do k-means++ clustering as well?
k-means++ is a routine to suggest center points before the classical k-means is called. The following lines of code will do that, where X is a matrix of data points, as requested for kmeans, and k the number of centers: kmpp <- function(X, k) { n <- nrow(X) C <- numeric(k) C[1] <- sample(1:n, 1) for (i in 2:k) { dm <- distmat(X, X[C, ]) pr <- apply(dm, 1, min); pr[C] <- 0 C[i] <- sample(1:n, 1, prob = pr) } kmeans(X, X[C, ]) } Here distmat(a, b) should return the distances between the rows of two matrices a and b There may be several implementations in R, one is distmat() in package pracma. Please note that AFAIK it is not clear whether the approach of kmeans++ is really better than, e.g., kmeans with several restarts. Hans Werner > > Thanks, > > -- > Dr. Ferebee Tunno > Assistant Professor > Department of Mathematics and Statistics > Arkansas State University > P.O. Box 70 > State University, AR. 72467 > ftu...@astate.edu > (870) 329-7710 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.