Optimized toIndexedRowMatrix

Driesprong, Fokko Wed, 20 Jan 2016 04:11:49 -0800

Hi guys,

I've been working on an optimized implementation of the toIndexedRowMatrix
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala#L271>
of the BlockMatrix. I already created a ticket
<https://issues.apache.org/jira/browse/SPARK-12869> and submitted a pull
<https://github.com/apache/spark/pull/10839> request at Github. What has to
be done to get this accepted? All the tests are passing.


On my own Github I created a project
<https://github.com/Fokko/BlockMatrixToIndexedRowMatrix> to see how the
performance is affected, for dense matrices this is a speedup of almost 19
times. Also for sparse matrices it will most likely be more optimal, as the
current implementation requires a lot of shuffling and creates high volumes
of intermediate objects (unless it is super sparse, but then also a
BlockMatrix would not be very optimal).

I would appreciate suggestions or tips to get this accepted.

Cheers, Fokko Driesprong.

Optimized toIndexedRowMatrix

Reply via email to