Re: K Means Clustering Explanation

2018-03-04 Thread Alessandro Solimando
Hi Matt, unfortunately I have no code pointer at hand. I will sketch how to accomplish this via the API, it will for sure at least help you getting started. 1) ETL + vectorization (I assume your feature vector to be named "features") 2) You run a clustering algorithm (say KMeans: https://spark.a

Re: K Means Clustering Explanation

2018-03-02 Thread Matt Hicks
Thanks Alessandro and Christoph.  I appreciate the feedback, but I'm still having issues determining how to actually accomplish this with the API. Can anyone point me to an example in code showing how to accomplish this? On Fri, Mar 2, 2018 2:37 AM, Alessandro Solimando alessandro.solima...

Re: K Means Clustering Explanation

2018-03-02 Thread Alessandro Solimando
Hi Matt, similarly to what Christoph does, I first derive the cluster id for the elements of my original dataset, and then I use a classification algorithm (cluster ids being the classes here). For this method to be useful you need a "human-readable" model, tree-based models are generally a good c

Re: K Means Clustering Explanation

2018-03-01 Thread Christoph Brücke
Hi Matt, I see. You could use the trained model to predict the cluster id for each training point. Now you should be able to create a dataset with your original input data and the associated cluster id for each data point in the input data. Now you can group this dataset by cluster id and aggregat

Re: K means clustering in spark

2015-12-31 Thread Yanbo Liang
Hi Anjali, The main output of KMeansModel is clusterCenters which is Array[Vector]. It has k elements where k is the number of clusters and each elements is the center of the specified cluster. Yanbo 2015-12-31 12:52 GMT+08:00 : > Hi, > > I am trying to use kmeans for clustering in spark using

Re: k-means clustering

2014-11-25 Thread Yanbo Liang
Pre-processing is major workload before training model. MLlib provide TD-IDF calculation, StandardScaler and Normalizer which is essential for preprocessing and would be great help to the model training. Take a look at this http://spark.apache.org/docs/latest/mllib-feature-extraction.html 2014-11

Re: K-means clustering

2014-11-25 Thread Xiangrui Meng
There is a simple example here: https://github.com/apache/spark/blob/master/examples/src/main/python/kmeans.py . You can take advantage of sparsity by computing the distance via inner products: http://spark-summit.org/2014/talk/sparse-data-support-in-mllib-2 -Xiangrui On Tue, Nov 25, 2014 at 2:39

Re: k-means clustering

2014-11-20 Thread Jun Yang
Guys, As to the questions of pre-processing, you could just migrate your logic to Spark before using K-means. I only used Scala on Spark, and haven't used Python binding on Spark, but I think the basic steps must be the same. BTW, if your data set is big with huge sparse dimension feature vector