Hey Meethu - what are you setting "K" to in the benchmarks you show? This can greatly affect the runtime.
On Thu, Sep 18, 2014 at 10:38 PM, Meethu Mathew <meethu.mat...@flytxt.com> wrote: > Hi all, > Please find attached the image of benchmark results. The table in the > previous mail got messed up. Thanks. > > > > On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote: > > Hi all, > > We have come up with an initial distributed implementation of Gaussian > Mixture Model in pyspark where the parameters are estimated using the > Expectation-Maximization algorithm.Our current implementation considers > diagonal covariance matrix for each component. > We did an initial benchmark study on a 2 node Spark standalone cluster > setup where each node config is 8 Cores,8 GB RAM, the spark version used > is 1.0.0. We also evaluated python version of k-means available in spark > on the same datasets.Below are the results from this benchmark study. > The reported stats are average from 10 runs.Tests were done on multiple > datasets with varying number of features and instances. > > > Dataset Gaussian mixture model > Kmeans(Python) > > Instances Dimensions Avg time per iteration Time for 100 iterations > Avg time per iteration Time for 100 iterations > 0.7million 13 > 7s > 12min > 13s 26min > 1.8million 11 > 17s > 29min 33s > 53min > 10 million 16 > 1.6min 2.7hr > 1.2min 2 hr > > > We are interested in contributing this implementation as a patch to > SPARK. Does MLLib accept python implementations? If not, can we > contribute to the pyspark component > I have created a JIRA for the same > https://issues.apache.org/jira/browse/SPARK-3588 .How do I get the > ticket assigned to myself? > > Please review and suggest how to take this forward. > > > > > > -- > > Regards, > > > > *Meethu Mathew* > > *Engineer* > > *Flytxt* > > Skype: meethu.mathew7 > > F: +91 471.2700202 > > www.flytxt.com | Visit our blog <http://blog.flytxt.com/> | Follow us > <http://www.twitter.com/flytxt> | *Connect on Linkedin > <http://www.linkedin.com/home?trk=hb_tab_home_top>* > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >