Hi guys, I've been working on an optimized implementation of the toIndexedRowMatrix <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala#L271> of the BlockMatrix. I already created a ticket <https://issues.apache.org/jira/browse/SPARK-12869> and submitted a pull <https://github.com/apache/spark/pull/10839> request at Github. What has to be done to get this accepted? All the tests are passing.
On my own Github I created a project <https://github.com/Fokko/BlockMatrixToIndexedRowMatrix> to see how the performance is affected, for dense matrices this is a speedup of almost 19 times. Also for sparse matrices it will most likely be more optimal, as the current implementation requires a lot of shuffling and creates high volumes of intermediate objects (unless it is super sparse, but then also a BlockMatrix would not be very optimal). I would appreciate suggestions or tips to get this accepted. Cheers, Fokko Driesprong.