Re: K-Means final cluster centers

2015-02-05 Thread Suneel Marthi
There's a kMeansModel.clusterCenters() available if u r looking to get the centers from KMeansModel. From: SK To: user@spark.apache.org Sent: Thursday, February 5, 2015 5:35 PM Subject: K-Means final cluster centers Hi, I am trying to get the final cluster centers after running th

Re: getting the cluster elements from kmeans run

2015-02-11 Thread Suneel Marthi
KMeansModel only returns the "cluster centroids". To get the # of elements in each cluster, try calling kmeans.predict() on each of the points in the data used to build the model. See https://github.com/OryxProject/oryx/blob/master/oryx-app-mllib/src/main/java/com/cloudera/oryx/app/mllib/kmeans/K

Re: Clustering text data with MLlib

2014-12-29 Thread Suneel Marthi
Here's the Streaming KMeans from Spark 1.2http://spark.apache.org/docs/latest/mllib-clustering.html#examples-1 Steaming KMeans still needs an initial 'k' to be specified, it then progresses to come up with an optimal 'k' IIRC. From: Sean Owen To: jatinpreet Cc: "user@spark.apache.org"

Re: Row similarities

2015-01-17 Thread Suneel Marthi
Andrew, u would be better off using Mahout's RowSimilarityJob for what u r trying to accomplish.  1.  It does give u pair-wise distances 2.  U can specify the Distance measure u r looking to use 3.  There's the old MapReduce impl and the Spark DSL impl per ur preference. From: Andrew Mus

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Suneel Marthi
Mahout does have a kmeans which can be executed in mapreduce and iterative modes. Sent from my iPhone > On Mar 25, 2014, at 9:25 AM, Prashant Sharma wrote: > > I think Mahout uses FuzzyKmeans, which is different algorithm and it is not > iterative. > > Prashant Sharma > > >> On Tue, Mar 2