Hi, I wrote my centralized ALS implementation, and read the distributed implementation in MLlib. It uses InLink and OutLink to implement functions like "get all products which are related to this user", and ultimately achieves model distribution.
If we have a distributed matrix lib, the complex InLink and OutLink logic can be relatively easily achieved with matrix select-row or select-column operators. With this InLink and OutLink based implementation, the distributed code is quite different and more complex than the centralized one. I have a question, could we move this complexity (InLink and OutLink) to a lower distributed matrix manipulation layer, leaving the upper layer ALS algorithm "similar" to a centralized one? To be more specific, if we can make a DoubleMatrix a RDD, optimize the distributed manipulation of it, we can make ALS algorithm easier to implement. Does it make any sense? Best regards, Wei --------------------------------- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan