Re: Spark Mllib kmeans execution

2016-03-02 Thread Sonal Goyal
It will run distributed On Mar 2, 2016 3:00 PM, "Priya Ch" wrote: > Hi All, > > I am running k-means clustering algorithm. Now, when I am running the > algorithm as - > > val conf = new SparkConf > val sc = new SparkContext(conf) > . > . > val kmeans = new KMeans() > val model = kmeans.run(RDD[

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2016-01-01 Thread Yanbo Liang
Hi Jia, I think the examples you provided is not very suitable to illustrate what driver and executors do, because it's not show the internal implementation of the KMeans algorithm. You can refer the source code of MLlib Kmeans ( https://github.com/apache/spark/blob/master/mllib/src/main/scala/org

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-31 Thread Jia Zou
Thanks, Yanbo. The results become much more reasonable, after I set driver memory to 5GB and increase worker memory to 25GB. So, my question is for following code snippet extracted from main method in JavaKMeans.java in examples, what will the driver do? and what will the worker do? I didn't unde

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-30 Thread Yanbo Liang
Hi Jia, You can try to use inputRDD.persist(MEMORY_AND_DISK) and verify whether it can produce stable performance. The storage level of MEMORY_AND_DISK will store the partitions that don't fit on disk and read them from there when they are needed. Actually, it's not necessary to set so large drive

Re: spark mllib kmeans

2015-05-21 Thread Pa Rö
i want evaluate some different distance measure for time-space clustering. so i need a api for implement my own function in java. 2015-05-19 22:08 GMT+02:00 Xiangrui Meng : > Just curious, what distance measure do you need? -Xiangrui > > On Mon, May 11, 2015 at 8:28 AM, Jaonary Rabarisoa > wrote

Re: spark mllib kmeans

2015-05-19 Thread Xiangrui Meng
Just curious, what distance measure do you need? -Xiangrui On Mon, May 11, 2015 at 8:28 AM, Jaonary Rabarisoa wrote: > take a look at this > https://github.com/derrickburns/generalized-kmeans-clustering > > Best, > > Jao > > On Mon, May 11, 2015 at 3:55 PM, Driesprong, Fokko > wrote: >> >> Hi Pa

Re: spark mllib kmeans

2015-05-11 Thread Jaonary Rabarisoa
take a look at this https://github.com/derrickburns/generalized-kmeans-clustering Best, Jao On Mon, May 11, 2015 at 3:55 PM, Driesprong, Fokko wrote: > Hi Paul, > > I would say that it should be possible, but you'll need a different > distance measure which conforms to your coordinate system.

Re: spark mllib kmeans

2015-05-11 Thread Driesprong, Fokko
Hi Paul, I would say that it should be possible, but you'll need a different distance measure which conforms to your coordinate system. 2015-05-11 14:59 GMT+02:00 Pa Rö : > hi, > > it is possible to use a custom distance measure and a other data typ as > vector? > i want cluster temporal geo dat