Interesting, thanks! That probably also explains why there seems to be a
ton of shuffle for this operation. So what's the best option for truly
scalable matrix multiplication on Spark then - implementing from scratch
using the coordinate matrix ((i,j), k) format?

On Wed, Jun 14, 2017 at 4:29 PM, John Compitello <jo...@broadinstitute.org>
wrote:

> Hey Anthony,
>
> You're the first person besides myself I've seen mention this. BlockMatrix
> multiply is not the best method. As far as me and my team can tell, the
> memory problem stems from the fact that when Spark tries to compute block
> (i, j) of the matrix, it tries to manifest all of row i from matrix 1 and
> all of column j from matrix 2 in memory at once on one executor. Then after
> doing that, it proceeds to combine them with a functional reduce, creating
> one additional block for each pair. So you end up manifesting 3n + logn
> matrix blocks in memory at once, which is why it sucks so much.
>
> Sent from my iPhone
>
> On Jun 14, 2017, at 7:07 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> wrote:
>
> I've been experimenting with MlLib's BlockMatrix for distributed matrix
> multiplication but consistently run into problems with executors being
> killed due to memory constrains. The linked gist (here
> <https://gist.github.com/thomas9t/3cf914e4d4609df35ee60ce08f36421b>) has
> a short example of multiplying a 25,000 x 25,000 square matrix taking
> approximately 5G of disk with a vector (also stored as a BlockMatrix). I am
> running this on a 3 node (1 master + 2 workers) cluster on Amazon EMR using
> the m4.xlarge instance type. Each instance has 16GB of RAM and 4 CPU. The
> gist has detailed information about the Spark environment.
>
> I have tried reducing the block size of the matrix, increasing the number
> of partitions in the underlying RDD, increasing defaultParallelism and
> increasing spark.yarn.executor.memoryOverhead (up to 3GB) - all without
> success. The input matrix should fit comfortably in distributed memory and
> the resulting matrix should be quite small (25,000 x 1) so I'm confused as
> to why Spark seems to want so much memory for this operation, and why Spark
> isn't spilling to disk here if it wants more memory. The job does
> eventually complete successfully, but for larger matrices stages have to be
> repeated several times which leads to long run times. I don't encounter any
> issues if I reduce the matrix size down to about 3GB. Can anyone with
> experience using MLLib's matrix operators provide any suggestions about
> what settings to look at, or what the hard constraints on memory for
> BlockMatrix multiplication are?
>
> Thanks,
>
> Anthony
>
>

Reply via email to