Re: [MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread John Compitello
then - implementing from scratch using the > coordinate matrix ((i,j), k) format? > >> On Wed, Jun 14, 2017 at 4:29 PM, John Compitello >> wrote: >> Hey Anthony, >> >> You're the first person besides myself I've seen mention this. BlockMatrix >

Re: [MLLib]: Executor OutOfMemory in BlockMatrix Multiplication

2017-06-14 Thread John Compitello
Hey Anthony, You're the first person besides myself I've seen mention this. BlockMatrix multiply is not the best method. As far as me and my team can tell, the memory problem stems from the fact that when Spark tries to compute block (i, j) of the matrix, it tries to manifest all of row i from

Re: Impact of coalesce operation before writing dataframe

2017-05-23 Thread John Compitello
Spark is doing operations on each partition in parallel. If you decrease number of partitions, you’re potentially doing less work in parallel depending on your cluster setup. > On May 23, 2017, at 4:23 PM, Andrii Biletskyi > wrote: > > > No, I didn't try to use repartition, how exactly it

Matrix multiplication and cluster / partition / blocks configuration

2017-05-11 Thread John Compitello
Hey all, I’ve found myself in a position where I need to do a relatively large matrix multiply (at least, compared to what I normally have to do). I’m looking to multiply a 100k by 500k dense matrix by its transpose to yield 100k by 100k matrix. I’m trying to do this on Google Cloud, so I don’