Hi, Matrix computation is critical for algorithm efficiency like least square, Kalman filter and so on. For now, the mllib module offers limited linear algebra on matrix, especially for distributed matrix.
We have been working on establishing distributed matrix computation APIs based on data structures in MLlib. The main idea is to partition the matrix into sub-blocks, based on the strategy in the following paper. http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf In our experiment, it's communication-optimal. But operations like factorization may not be appropriate to carry out in blocks. Any suggestions and guidance are welcome. Thanks, Yuxi