Interesting, thanks! That probably also explains why there seems to be a ton of shuffle for this operation. So what's the best option for truly scalable matrix multiplication on Spark then - implementing from scratch using the coordinate matrix ((i,j), k) format?
On Wed, Jun 14, 2017 at 4:29 PM, John Compitello <jo...@broadinstitute.org> wrote: > Hey Anthony, > > You're the first person besides myself I've seen mention this. BlockMatrix > multiply is not the best method. As far as me and my team can tell, the > memory problem stems from the fact that when Spark tries to compute block > (i, j) of the matrix, it tries to manifest all of row i from matrix 1 and > all of column j from matrix 2 in memory at once on one executor. Then after > doing that, it proceeds to combine them with a functional reduce, creating > one additional block for each pair. So you end up manifesting 3n + logn > matrix blocks in memory at once, which is why it sucks so much. > > Sent from my iPhone > > On Jun 14, 2017, at 7:07 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote: > > I've been experimenting with MlLib's BlockMatrix for distributed matrix > multiplication but consistently run into problems with executors being > killed due to memory constrains. The linked gist (here > <https://gist.github.com/thomas9t/3cf914e4d4609df35ee60ce08f36421b>) has > a short example of multiplying a 25,000 x 25,000 square matrix taking > approximately 5G of disk with a vector (also stored as a BlockMatrix). I am > running this on a 3 node (1 master + 2 workers) cluster on Amazon EMR using > the m4.xlarge instance type. Each instance has 16GB of RAM and 4 CPU. The > gist has detailed information about the Spark environment. > > I have tried reducing the block size of the matrix, increasing the number > of partitions in the underlying RDD, increasing defaultParallelism and > increasing spark.yarn.executor.memoryOverhead (up to 3GB) - all without > success. The input matrix should fit comfortably in distributed memory and > the resulting matrix should be quite small (25,000 x 1) so I'm confused as > to why Spark seems to want so much memory for this operation, and why Spark > isn't spilling to disk here if it wants more memory. The job does > eventually complete successfully, but for larger matrices stages have to be > repeated several times which leads to long run times. I don't encounter any > issues if I reduce the matrix size down to about 3GB. Can anyone with > experience using MLLib's matrix operators provide any suggestions about > what settings to look at, or what the hard constraints on memory for > BlockMatrix multiplication are? > > Thanks, > > Anthony > >