You are using the wrong algorithm. You want Partitioning around Medoids (PAM, function pam), not k-means. PAM is also known as k-medoids, which is where the confusion may come from.
use library(cluster) cl = pam(dis, 4) and see if you get what you want. HTH, Peter On Mon, Apr 28, 2014 at 9:15 PM, cassie jones <cassiejone...@gmail.com> wrote: > Dear R-users, > > I am trying to run kmeans on a set comprising of 100 observations. But R > somehow can not figure out the true underlying groups, although other > software such as Jmp, MINITAB are producing the desired result. > > Following is a brief example of what I am doing. > > library(stringdist) > test=c('hematolgy','hemtology','oncology','onclogy', > 'oncolgy','dermatolgy','dermatoloy','dematology', > 'neurolog','nerology','neurolgy','nerology') > > dis=stringdistmatrix(test,test, method = "lv") > > set.seed(123) > cl=kmeans(dis,4) > > > grp_cl=vector('list',4) > > for(i in 1:4) > { > grp_cl[[i]]=test[which(cl$cluster==i)] > } > grp_cl > > [[1]] > [1] "oncology" "onclogy" > > [[2]] > [1] "neurolog" "nerology" "neurolgy" "nerology" > > [[3]] > [1] "oncolgy" > > [[4]] > [1] "hematolgy" "hemtology" "dermatolgy" "dermatoloy" "dematology" > > In the above example, the 'test' variable consists of a set of > terminologies with various typos and I am trying to group the similar types > of words based on their string distance. Unfortunately kmeans is not able > to replicate the following result that the other software are able to > produce. > [[1]] > [1] "oncology" "onclogy" "oncolgy" > > [[2]] > [1] "neurolog" "nerology" "neurolgy" "nerology" > > [[3]] > [1] "dermatolgy" "dermatoloy" "dematology" > > [[4]] > [1] "hematolgy" "hemtology" > > > Does anyone know if there is a way out, I have heard from a lot of people > that multivariate analysis in R does not produce the desired result most of > the time. Any help is really appreciated. > > > Thanks in advance. > > > Cassie > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.