Hey Meethu - what are you setting "K" to in the benchmarks you show? This
can greatly affect the runtime.

On Thu, Sep 18, 2014 at 10:38 PM, Meethu Mathew <meethu.mat...@flytxt.com>
wrote:

>  Hi all,
> Please find attached the image of benchmark results. The table in the
> previous mail got messed up. Thanks.
>
>
>
> On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote:
>
> Hi all,
>
> We have come up with an initial distributed implementation of Gaussian
> Mixture Model in pyspark where the parameters are estimated using the
> Expectation-Maximization algorithm.Our current implementation considers
> diagonal covariance matrix for each component.
> We did an initial benchmark study on a 2 node Spark standalone cluster
> setup where each node config is 8 Cores,8 GB RAM, the spark version used
> is 1.0.0. We also evaluated python version of k-means available in spark
> on the same datasets.Below are the results from this benchmark study.
> The reported stats are average from 10 runs.Tests were done on multiple
> datasets with varying number of features and instances.
>
>
>           Dataset           Gaussian mixture model
>                      Kmeans(Python)
>
> Instances     Dimensions      Avg time per iteration  Time for 100 iterations
>       Avg time per iteration  Time for 100 iterations
> 0.7million    13
>       7s
>       12min
>         13s   26min
> 1.8million    11
>       17s
>        29min     33s
>        53min
> 10 million    16
>       1.6min  2.7hr
>         1.2min        2 hr
>
>
> We are interested in contributing this implementation as a patch to
> SPARK. Does MLLib accept python implementations? If not, can we
> contribute to the pyspark component
> I have created a JIRA for the same 
> https://issues.apache.org/jira/browse/SPARK-3588 .How do I get the
> ticket assigned to myself?
>
> Please review and suggest how to take this forward.
>
>
>
>
>
> --
>
> Regards,
>
>
>
> *Meethu Mathew*
>
> *Engineer*
>
> *Flytxt*
>
> Skype: meethu.mathew7
>
>  F:  +91 471.2700202
>
> www.flytxt.com | Visit our blog <http://blog.flytxt.com/> |  Follow us
> <http://www.twitter.com/flytxt> | *Connect on Linkedin
> <http://www.linkedin.com/home?trk=hb_tab_home_top>*
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

Reply via email to