To be precise, the optimization is not `get all products that are
related to this user` but `get all products that are related to users
inside this block`. So a product factor won't be sent to the same
block more than once. We considered using GraphX to implement ALS,
which is much easier to understand. Right now, GraphX doesn't support
the optimization we use in ALS. But we are definitely looking for
simplification of MLlib's ALS code. If you have some good ideas,
please create a JIRA and we can move our discussion there.

Best,
Xiangrui

On Sun, Aug 3, 2014 at 8:39 PM, Wei Tan <w...@us.ibm.com> wrote:
> Hi,
>
>   I wrote my centralized ALS implementation, and read the distributed
> implementation in MLlib. It uses InLink and OutLink to implement functions
> like "get all products which are related to this user", and ultimately
> achieves model distribution.
>
>   If we have a distributed matrix lib, the complex InLink and OutLink logic
> can be relatively easily achieved with matrix select-row or select-column
> operators. With this InLink and OutLink based implementation, the
> distributed code is quite different and more complex than the centralized
> one.
>
>   I have a question, could we move this complexity (InLink and OutLink) to a
> lower distributed matrix manipulation layer, leaving the upper layer ALS
> algorithm "similar" to a centralized one? To be more specific, if we can
> make a DoubleMatrix a RDD, optimize the distributed manipulation of it, we
> can make ALS algorithm easier to implement.
>
>   Does it make any sense?
>
>   Best regards,
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to