I already use breeze, actually my current impl of sqDist uses it:
https://github.com/danielkorzekwa/bayes-scala-gp/blob/master/src/main/scala/dk/gp/math/sqDist.scala

still 3 times slower that sq_dist from gpml

thanks for BID Data Project info

On 9 September 2015 at 18:45, Dmitriy Lyubimov <[email protected]> wrote:

> Hi Daniel,
>
> you mean, for dense algebra single-threaded java vs. cache, multithreaded,
> SSE4-optimized Intel MKL? I am actually surprised it is not at least 10x.
>
> Mahout focuses on ease of distributed implementations (i.e. dsq_dist
> variant of the routine) but has been somewhat lazy on marrying mahout-math
> with hardware-optimized in-core libraries. That much is true.
>
> The things that somewhat downplayed priority of in-core cpu-bound algebra
> optimizations were:
>
> (1)  distributed operations multithreading plays significantly smaller role
> (well-behaved tasks should assume they are allocted only 1 core and rely on
> resource manager to allocate cpu resources)
> (2) for distrubuted algorithms, unless they are naive power-law ports of
> in-core algorithms, I/O and data serialization expenses start to play a
> significant role in overall algorithm performance compared to shared-memory
> single-machine algorithms.
> (3) a lot of algorithms require non-blas kernel operators anyway
> (4) most importantly, standard BLAS is somewhat unsatisfactory in the
> sparse algebra department, I would seek  a better solution than just BLAS
> API. There are some emerging technologies that are sparse/dense balanced
> libraries, but the jury is still out as to what best pathway here is. Or
> maybe, the best path is to do what Teano and BidMat did, i.e. developing
> new set of algebraic kernel routines, but that's probably too heavy for
> this project at the moment.
>
> If you need a good cpu-bound shared-memory environment for dense algebra,
> i'd suggest to try either Breeze or BidMat. Perhaps even the latter as it
> does support sparse subroutines, somewhat anyway, and also has GPU-enabled
> set of matrix implementations.
>
> On Wed, Sep 9, 2015 at 12:21 AM, Daniel Korzekwa <
> [email protected]>
> wrote:
>
> > Hello,
> >
> > I'm comparing the efficiency of sq_dist() from mahout to sq_dist()  from
> > gpml library that is based on bsxfun in octave/matlab.
> >
> > It seems that computing the distance matrix in octave is 5 times faster
> > than in Mahout.Why is that? Can we make it faster?
> >
> > Octave:
> >  x = [1:4000]
> >  sq_dist(x)
> >
> > Scala (Mahout):
> >   val x = Array.range(1,4000,1).map(i => i.toDouble)
> >   val A =  new DenseMatrix(Array(x)).transpose()
> >   val dM = sqDist(A)
> >
> > --
> > Daniel Korzekwa
> > Machine Learning Engineer
> > https://www.linkedin.com/in/danielkorzekwa <http://danmachine.com/>
> >
>



-- 
Daniel Korzekwa
Machine Learning Engineer
https://www.linkedin.com/in/danielkorzekwa <http://danmachine.com/>

Reply via email to