MLLib: implementing ALS with distributed matrix

Wei Tan Sun, 03 Aug 2014 20:41:26 -0700

Hi,

  I wrote my centralized ALS implementation, and read the distributed 
implementation in MLlib. It uses InLink and OutLink to implement functions 
like "get all products which are related to this user", and ultimately 
achieves model distribution.


  If we have a distributed matrix lib, the complex InLink and OutLink 
logic can be relatively easily achieved with matrix select-row or 
select-column operators. With this InLink and OutLink based 
implementation, the distributed code is quite different and more complex 
than the centralized one.

  I have a question, could we move this complexity (InLink and OutLink) to 
a lower distributed matrix manipulation layer, leaving the upper layer ALS 
algorithm "similar" to a centralized one? To be more specific, if we can 
make a DoubleMatrix a RDD, optimize the distributed manipulation of it, we 
can make ALS algorithm easier to implement.

  Does it make any sense?

  Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

MLLib: implementing ALS with distributed matrix

Reply via email to