Spark has much more overhead, since it's set up to distribute the computation. Julia isn't distributed, and so has no such overhead in a completely in-core implementation. You generally use Spark when you have a problem large enough to warrant distributing, or, your data already lives in a distributed store like HDFS.
But it's also possible you're not configuring the implementations the same way, yes. There's not enough info here really to say. On Fri, Dec 5, 2014 at 9:50 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Hi all, > > I'm trying to a run clustering with kmeans algorithm. The size of my data > set is about 240k vectors of dimension 384. > > Solving the problem with the kmeans available in julia (kmean++) > > http://clusteringjl.readthedocs.org/en/latest/kmeans.html > > take about 8 minutes on a single core. > > Solving the same problem with spark kmean|| take more than 1.5 hours with 8 > cores!!!! > > Either they don't implement the same algorithm either I don't understand how > the kmeans in spark works. Is my data not big enough to take full advantage > of spark ? At least, I expect to the same runtime. > > > Cheers, > > > Jao --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org