Hi Sean, Thanks for the quick answer. I have not realized that I can make an RDD[Vector] with eg.
val dataSet = sparkContext.makeRDD(List(Vectors.dense(10.0,20.0), Vectors.dense(20.0,30.0))) Using this KMeans.train works as it should. So my bad. Thanks again! Attila 2015-02-15 17:29 GMT+01:00 Sean Owen <so...@cloudera.com>: > Clustering operates on a large number of n-dimensional vectors. That > seems to be what you are describing, and that is what the MLlib API > accepts. What are you expecting that you don't find? > > Did you have a look at the KMeansModel that this method returns? it > has a "clusterCenters" method that gives you what you're looking for. > Explore the API a bit more first. > > On Sun, Feb 15, 2015 at 4:26 PM, Attila Tóth <atez...@gmail.com> wrote: > > Dear Spark User List, > > > > I'm fairly new to Spark, trying to use it for multi-dimensional > clustering > > (using the k-means clustering from MLib). However, based on the examples > the > > clustering seems to work only for a single dimension (KMeans.train() > accepts > > an RDD[Vector], which is a vector of doubles - I have a list of array of > > doubles, eg. a list of n-dimensional coordinates). > > > > Is there any way with which, given a list of arrays (or vectors) of > doubles, > > I can get out the list of cluster centres (as a list of n-dimensional > > coordinates) in Spark? > > > > I'm using Scala. > > > > Thanks in advance, > > Attila >