Cassie, I am sorry but do you even know what k-means does? That it is a locally optimal algorithm. That different software implement the same algorithm differently.
FYI, R uses the Hartigan-Wong (1979) algorithm by default, which is probably the most efficient out there. I suggest you first go to a multivariate statistics class before passing such sweeping statements. (Btw, did these same "some people" tell you that most other software do not provide the kinds of broad abilities which R provides, and therefore are not even comparable.) And then, please read the help function for how to "improve" your run of k-means using R. HTH, Ranjan On Tue, 29 Apr 2014 09:45:18 +0530 cassie jones <cassiejone...@gmail.com> wrote: > Dear R-users, > > I am trying to run kmeans on a set comprising of 100 observations. But R > somehow can not figure out the true underlying groups, although other > software such as Jmp, MINITAB are producing the desired result. > > Following is a brief example of what I am doing. > > library(stringdist) > test=c('hematolgy','hemtology','oncology','onclogy', > 'oncolgy','dermatolgy','dermatoloy','dematology', > 'neurolog','nerology','neurolgy','nerology') > > dis=stringdistmatrix(test,test, method = "lv") > > set.seed(123) > cl=kmeans(dis,4) > > > grp_cl=vector('list',4) > > for(i in 1:4) > { > grp_cl[[i]]=test[which(cl$cluster==i)] > } > grp_cl > > [[1]] > [1] "oncology" "onclogy" > > [[2]] > [1] "neurolog" "nerology" "neurolgy" "nerology" > > [[3]] > [1] "oncolgy" > > [[4]] > [1] "hematolgy" "hemtology" "dermatolgy" "dermatoloy" "dematology" > > In the above example, the 'test' variable consists of a set of > terminologies with various typos and I am trying to group the similar types > of words based on their string distance. Unfortunately kmeans is not able > to replicate the following result that the other software are able to > produce. > [[1]] > [1] "oncology" "onclogy" "oncolgy" > > [[2]] > [1] "neurolog" "nerology" "neurolgy" "nerology" > > [[3]] > [1] "dermatolgy" "dermatoloy" "dematology" > > [[4]] > [1] "hematolgy" "hemtology" > > > Does anyone know if there is a way out, I have heard from a lot of people > that multivariate analysis in R does not produce the desired result most of > the time. Any help is really appreciated. > > > Thanks in advance. > > > Cassie > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. Please respond to the mailing list if appropriate. For those needing to send personal or professional e-mail, please use appropriate addresses. ____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.