No problem. It was a big headache for my team as well. One of us already
reimplemented it from scratch, as seen in this pending PR for our project.
https://github.com/hail-is/hail/pull/1895
Hopefully you find that useful. We'll hopefully try to PR that into Spark at
some point.
Best,
John
S
Interesting, thanks! That probably also explains why there seems to be a
ton of shuffle for this operation. So what's the best option for truly
scalable matrix multiplication on Spark then - implementing from scratch
using the coordinate matrix ((i,j), k) format?
On Wed, Jun 14, 2017 at 4:29 PM, J
Hey Anthony,
You're the first person besides myself I've seen mention this. BlockMatrix
multiply is not the best method. As far as me and my team can tell, the memory
problem stems from the fact that when Spark tries to compute block (i, j) of
the matrix, it tries to manifest all of row i from