Clustering operates on a large number of n-dimensional vectors. That seems to be what you are describing, and that is what the MLlib API accepts. What are you expecting that you don't find?
Did you have a look at the KMeansModel that this method returns? it has a "clusterCenters" method that gives you what you're looking for. Explore the API a bit more first. On Sun, Feb 15, 2015 at 4:26 PM, Attila Tóth <[email protected]> wrote: > Dear Spark User List, > > I'm fairly new to Spark, trying to use it for multi-dimensional clustering > (using the k-means clustering from MLib). However, based on the examples the > clustering seems to work only for a single dimension (KMeans.train() accepts > an RDD[Vector], which is a vector of doubles - I have a list of array of > doubles, eg. a list of n-dimensional coordinates). > > Is there any way with which, given a list of arrays (or vectors) of doubles, > I can get out the list of cluster centres (as a list of n-dimensional > coordinates) in Spark? > > I'm using Scala. > > Thanks in advance, > Attila --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
