Re: [MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread John Compitello
No problem. It was a big headache for my team as well. One of us already reimplemented it from scratch, as seen in this pending PR for our project. https://github.com/hail-is/hail/pull/1895 Hopefully you find that useful. We'll hopefully try to PR that into Spark at some point. Best, John S

Re: [MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread Anthony Thomas
Interesting, thanks! That probably also explains why there seems to be a ton of shuffle for this operation. So what's the best option for truly scalable matrix multiplication on Spark then - implementing from scratch using the coordinate matrix ((i,j), k) format? On Wed, Jun 14, 2017 at 4:29 PM, J

Re: [MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread John Compitello
Hey Anthony, You're the first person besides myself I've seen mention this. BlockMatrix multiply is not the best method. As far as me and my team can tell, the memory problem stems from the fact that when Spark tries to compute block (i, j) of the matrix, it tries to manifest all of row i from

[MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread Anthony Thomas
I've been experimenting with MlLib's BlockMatrix for distributed matrix multiplication but consistently run into problems with executors being killed due to memory constrains. The linked gist (here ) has a short example of multiplyi