Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Sean Owen
t;> returned >> from updateFeatures() I was able to avoid a raft of duplicate >> computations. >> Is there a reason not to do this? >> >> Thanks. >> >> >> >> -- >>

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Nick Pentreath
ialized. >> > >> > I also found that the product and user RDDs were being rebuilt >> > many times >> > over in my tests, even for tiny data sets. By persisting the RDD >> > returned >> > from updateFeatures() I was able

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Debasish Das
> > Is there a reason not to do this? > > > > Thanks. > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s > > -ALS-implement

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Michael Allman
ailing list archive at > Nabble.com. > > > ____________________ > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s &

Re: possible bug in Spark's ALS implementation...

2014-04-01 Thread Nick Pentreath
Hi Michael Would you mind setting out exactly what differences you did find between the Spark and Oryx implementations? Would be good to be clear on them, and also see if there are further tricks/enhancements from the Oryx one that can be ported (such as the lambda * numRatings adjustment). N O

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Nick Pentreath
Great work Xiangrui thanks for the enhancement!— Sent from Mailbox for iPhone On Wed, Mar 19, 2014 at 12:08 AM, Xiangrui Meng wrote: > Glad to hear the speed-up. Wish we can improve the implementation > further in the future. -Xiangrui > On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote: >>

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Xiangrui Meng
Glad to hear the speed-up. Wish we can improve the implementation further in the future. -Xiangrui On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote: > I just ran a runtime performance comparison between 0.9.0-incubating and your > als branch. I saw a 1.5x improvement in performance. > > > >

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Michael Allman
I just ran a runtime performance comparison between 0.9.0-incubating and your als branch. I saw a 1.5x improvement in performance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html Sent from the Apach

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Xiangrui Meng
Sorry, the link was wrong. Should be https://github.com/apache/spark/pull/131 -Xiangrui On Tue, Mar 18, 2014 at 10:20 AM, Michael Allman wrote: > Hi Xiangrui, > > I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can > you explain? > > Also, thanks for addressing the issue

Re: possible bug in Spark's ALS implementation...

2014-03-18 Thread Michael Allman
Hi Xiangrui, I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can you explain? Also, thanks for addressing the issue with factor matrix persistence in PR 165. I was probably not going to get to that for a while. I will try to test your changes today for speed improvements

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Xiangrui Meng
Hi Michael, I made couple changes to implicit ALS. One gives faster construction of YtY (https://github.com/apache/spark/pull/161), which was merged into master. The other caches intermediate matrix factors properly (https://github.com/apache/spark/pull/165). They should give you the same result a

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Michael Allman
I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address the issue of the factor matrix recomputation. I'm planning to submit a related pull request shortly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-imp

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Michael Allman
You are correct, in the long run it doesn't matter which matrix you begin the iterative process with. I was thinking in terms of doing a side-by-side comparison to Oryx. I've posted a bug report as SPARK-1262. I described the problem I found and the mitigation strategy I've used. I think that this

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Xiangrui Meng
The factor matrix Y is used twice in implicit ALS computation, one to compute global Y^T Y, and another to compute local Y_i^T C_i Y_i. -Xiangrui On Sun, Mar 16, 2014 at 1:18 PM, Matei Zaharia wrote: > On Mar 14, 2014, at 5:52 PM, Michael Allman wrote: > > I also found that the product and user

Re: possible bug in Spark's ALS implementation...

2014-03-16 Thread Matei Zaharia
On Mar 14, 2014, at 5:52 PM, Michael Allman wrote: > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a rea

Re: possible bug in Spark's ALS implementation...

2014-03-14 Thread Xiangrui Meng
Hi Michael, Thanks for looking into the details! Computing X first and computing Y first can deliver different results, because the initial objective values could differ by a lot. But the algorithm should converge after a few iterations. It is hard to tell which should go first. After all, the def

Re: possible bug in Spark's ALS implementation...

2014-03-14 Thread Michael Allman
I've been thoroughly investigating this issue over the past couple of days and have discovered quite a bit. For one thing, there is definitely (at least) one issue/bug in the Spark implementation that leads to incorrect results for models generated with rank > 1 or a large number of iterations. I w

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Sean Owen
Ah, thank you, I had actually forgotten about this and this is indeed probably a difference. This is from the other paper I cited: http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf It's the "WR" in "ALS-WR" -- weighted regularization. I supp

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Michael Allman
Hi Sean, Digging deeper I've found another difference between Oryx's implementation and Spark's. Why do you adjust lambda here? https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491 Cheers, Michael

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Michael Allman
Thank you everyone for your feedback. It's been very helpful, and though I still haven't found the cause of the difference between Spark and Oryx, I feel I'm making progress. Xiangrui asked me to create a ticket for this issue. The reason I didn't do this originally is because it's not clear to me

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Sean Owen
On Wed, Mar 12, 2014 at 7:36 AM, Nick Pentreath wrote: > @Sean, would it be a good idea to look at changing the regularization in > Spark's ALS to alpha * lambda? What is the thinking behind this? If I > recall, the Mahout version added something like (# ratings * lambda) as > regularization in ea

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Sebastian Schelter
The mahout implementation is just a straight-forward port of the paper. No changes have been made. On 03/12/2014 08:36 AM, Nick Pentreath wrote: It would be helpful to know what parameter inputs you are using. If the regularization schemes are different (by a factor of alpha, which can often b

Re: possible bug in Spark's ALS implementation...

2014-03-12 Thread Nick Pentreath
It would be helpful to know what parameter inputs you are using. If the regularization schemes are different (by a factor of alpha, which can often be quite high) this will mean that the same parameter settings could give very different results. A higher lambda would be required with Spark's versi

Re: possible bug in Spark's ALS implementation...

2014-03-11 Thread Xiangrui Meng
Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some metrics to tell which recommendation is better? -Xiangrui On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng wrote: > Hi Michael, > > I can help check the current impleme

Re: possible bug in Spark's ALS implementation...

2014-03-11 Thread Sean Owen
On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman wrote: > I'm seeing counterintuitive, sometimes nonsensical recommendations. For > comparison, I've run the training data through Oryx's in-VM implementation > of implicit ALS with the same parameters. Oryx uses the same algorithm. > (Source in this

Re: possible bug in Spark's ALS implementation...

2014-03-11 Thread Xiangrui Meng
Hi Michael, I can help check the current implementation. Would you please go to https://spark-project.atlassian.net/browse/SPARK and create a ticket about this issue with component "MLlib"? Thanks! Best, Xiangrui On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote: > Hi, > > I'm implementing

possible bug in Spark's ALS implementation...

2014-03-11 Thread Michael Allman
Hi, I'm implementing a recommender based on the algorithm described in http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the basis for Spark's ALS implementation for data sets with implicit features. The data set I'm working with is proprietary and I cannot share it, howe