Re: MLLib: implementing ALS with distributed matrix

Debasish Das Sun, 17 Aug 2014 17:17:16 -0700

Hi Wei,

Sparkler code was not available for benchmarking and so I picked up
Jellyfish which uses SGD and if you look at the paper, the ideas are very
similar to sparkler paper but Jellyfish is on shared memory and uses C code
while sparkler was built on top of spark...Jellyfish used some interesting
regularization like L1 ball projection and in-fact the paper produced the
best rmse for netflix dataset using these ideas...

Jellyfish was a good runtime baseline to compare against...Of course the
issue with shared memory is scalability...

Sparkler was written when the ALS version that we have right now in mllib
did not exist. The blocking idea used in Sparkler is not very different
than what we have in ALS right now. But Sparkler used a faster
checkpointing method (they called it Carousal Map) which is equally
applicable to mllib ALS. Right now we checkpoint to hdfs but an efficient
hot distributed cache (like memcached, redis or cassandra) will help
improve the runtime further but will bring additional dependencies...right
now mllib assumes hdfs and tachyon as checkpoint solution...

In my experiments I found distributed spark ALS on 16 cores (8 nodes, each
running 2 cores) to be around 2x slower compared to Jellyfish SGD also
running on 16 cores in shared memory. The biggest machine we had is 16
core. Note that we did factor 20M users and 3M product ranges with ~400M
ratings. 2x was a good number for me. I also found out spark ALS was 6-7X
faster than Oryx (scalable version of Mahout ALS).

In terms of RMSE, SGD produced RMSE very similar to ALS in my experiments
with Netflix dataset. On new datasets, I still run Jellyfish to make sure
that if on some dataset, SGD produces better local minima than ALS, that's
an motivation to implement DSGD/Sparkler...I have cleaned up some netflix
specific optimizations from Jellyfish. I also implemented a non-negative
variant (similar projected gradient algorithm as used in NNLS.scala) but
due to SGD implementing a projected gradient and getting good convergence
turned out to be difficult. If you want I can open source the code.

Spark ALS (and Oryx/Mahout's taste) represented the problem as least
squares. For explicit no broadcast is needed and for implicit a global gram
matrix is computed with rank x rank broadcast...if ranks are low, it is not
a big issue. The way to represent the problem as least square has other
benefits as well since we can map least square to a quadratic program and
use the following ideas:

https://issues.apache.org/jira/browse/SPARK-2426

We are doing much thorough analysis within our Spark ALS variants with
Jellyfish on shared memory...I will point to you when the results are
available..

I feel it might be worthwhile to represent the factorization problem on
graphx but that's a different direction altogether from blocked matrix
factorization ideas.

Thanks.
Deb

On Sun, Aug 17, 2014 at 12:14 PM, Wei Tan <w...@us.ibm.com> wrote:

> Hi Xiangrui, yes I was not clear in my previous posting. You did
> optimization in block-level (which is brilliant!) so that blocks are joined
> first to avoid redundant data transfer.
>
> I have two follow-up questions:
>
>    1. when you do rdd_a.join(rdd_b), which site this join will be done?
>    Say, if sizeOf(rdd_a)>>sizeOf(rdd_b) then Spark moves rdd_b to rdd_a (in a
>    per block manner) and do the join?
>    2. for matrix factorization, there exist some distributed SGD
>    algorithms such as:
>    http://people.cs.umass.edu/~boduo/publications/2013EDBT-sparkler.pdf .
>    I plan to do some performance comparison recently. Any idea on which method
>    is better?
>    3.
>    4.
>
> Thanks!
>
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> *http://researcher.ibm.com/person/us-wtan*
> <http://researcher.ibm.com/person/us-wtan>
>
>
>
> From:        Xiangrui Meng <men...@gmail.com>
> To:        Wei Tan/Watson/IBM@IBMUS,
> Cc:        "user@spark.apache.org" <user@spark.apache.org>
> Date:        08/04/2014 12:51 AM
> Subject:        Re: MLLib: implementing ALS with distributed matrix
> ------------------------------
>
>
>
> To be precise, the optimization is not `get all products that are
> related to this user` but `get all products that are related to users
> inside this block`. So a product factor won't be sent to the same
> block more than once. We considered using GraphX to implement ALS,
> which is much easier to understand. Right now, GraphX doesn't support
> the optimization we use in ALS. But we are definitely looking for
> simplification of MLlib's ALS code. If you have some good ideas,
> please create a JIRA and we can move our discussion there.
>
> Best,
> Xiangrui
>
> On Sun, Aug 3, 2014 at 8:39 PM, Wei Tan <w...@us.ibm.com> wrote:
> > Hi,
> >
> >   I wrote my centralized ALS implementation, and read the distributed
> > implementation in MLlib. It uses InLink and OutLink to implement
> functions
> > like "get all products which are related to this user", and ultimately
> > achieves model distribution.
> >
> >   If we have a distributed matrix lib, the complex InLink and OutLink
> logic
> > can be relatively easily achieved with matrix select-row or select-column
> > operators. With this InLink and OutLink based implementation, the
> > distributed code is quite different and more complex than the centralized
> > one.
> >
> >   I have a question, could we move this complexity (InLink and OutLink)
> to a
> > lower distributed matrix manipulation layer, leaving the upper layer ALS
> > algorithm "similar" to a centralized one? To be more specific, if we can
> > make a DoubleMatrix a RDD, optimize the distributed manipulation of it,
> we
> > can make ALS algorithm easier to implement.
> >
> >   Does it make any sense?
> >
> >   Best regards,
> > Wei
> >
> > ---------------------------------
> > Wei Tan, PhD
> > Research Staff Member
> > IBM T. J. Watson Research Center
> > http://researcher.ibm.com/person/us-wtan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>

Re: MLLib: implementing ALS with distributed matrix

Reply via email to