t;> returned
>> from updateFeatures() I was able to avoid a raft of duplicate
>> computations.
>> Is there a reason not to do this?
>>
>> Thanks.
>>
>>
>>
>> --
>>
ialized.
>> >
>> > I also found that the product and user RDDs were being rebuilt
>> > many times
>> > over in my tests, even for tiny data sets. By persisting the RDD
>> > returned
>> > from updateFeatures() I was able
> > Is there a reason not to do this?
> >
> > Thanks.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
> > -ALS-implement
ailing list archive at
> Nabble.com.
>
>
> ____________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s
&
Hi Michael
Would you mind setting out exactly what differences you did find between
the Spark and Oryx implementations? Would be good to be clear on them, and
also see if there are further tricks/enhancements from the Oryx one that
can be ported (such as the lambda * numRatings adjustment).
N
O
Great work Xiangrui thanks for the enhancement!—
Sent from Mailbox for iPhone
On Wed, Mar 19, 2014 at 12:08 AM, Xiangrui Meng wrote:
> Glad to hear the speed-up. Wish we can improve the implementation
> further in the future. -Xiangrui
> On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote:
>>
Glad to hear the speed-up. Wish we can improve the implementation
further in the future. -Xiangrui
On Tue, Mar 18, 2014 at 1:55 PM, Michael Allman wrote:
> I just ran a runtime performance comparison between 0.9.0-incubating and your
> als branch. I saw a 1.5x improvement in performance.
>
>
>
>
I just ran a runtime performance comparison between 0.9.0-incubating and your
als branch. I saw a 1.5x improvement in performance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html
Sent from the Apach
Sorry, the link was wrong. Should be
https://github.com/apache/spark/pull/131 -Xiangrui
On Tue, Mar 18, 2014 at 10:20 AM, Michael Allman wrote:
> Hi Xiangrui,
>
> I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can
> you explain?
>
> Also, thanks for addressing the issue
Hi Xiangrui,
I don't see how https://github.com/apache/spark/pull/161 relates to ALS. Can
you explain?
Also, thanks for addressing the issue with factor matrix persistence in PR
165. I was probably not going to get to that for a while.
I will try to test your changes today for speed improvements
Hi Michael,
I made couple changes to implicit ALS. One gives faster construction
of YtY (https://github.com/apache/spark/pull/161), which was merged
into master. The other caches intermediate matrix factors properly
(https://github.com/apache/spark/pull/165). They should give you the
same result a
I've created https://spark-project.atlassian.net/browse/SPARK-1263 to address
the issue of the factor matrix recomputation. I'm planning to submit a
related pull request shortly.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-imp
You are correct, in the long run it doesn't matter which matrix you begin the
iterative process with. I was thinking in terms of doing a side-by-side
comparison to Oryx.
I've posted a bug report as SPARK-1262. I described the problem I found and
the mitigation strategy I've used. I think that this
The factor matrix Y is used twice in implicit ALS computation, one to
compute global Y^T Y, and another to compute local Y_i^T C_i Y_i.
-Xiangrui
On Sun, Mar 16, 2014 at 1:18 PM, Matei Zaharia wrote:
> On Mar 14, 2014, at 5:52 PM, Michael Allman wrote:
>
> I also found that the product and user
On Mar 14, 2014, at 5:52 PM, Michael Allman wrote:
> I also found that the product and user RDDs were being rebuilt many times
> over in my tests, even for tiny data sets. By persisting the RDD returned
> from updateFeatures() I was able to avoid a raft of duplicate computations.
> Is there a rea
Hi Michael,
Thanks for looking into the details! Computing X first and computing Y
first can deliver different results, because the initial objective
values could differ by a lot. But the algorithm should converge after
a few iterations. It is hard to tell which should go first. After all,
the def
I've been thoroughly investigating this issue over the past couple of days
and have discovered quite a bit. For one thing, there is definitely (at
least) one issue/bug in the Spark implementation that leads to incorrect
results for models generated with rank > 1 or a large number of iterations.
I w
Ah, thank you, I had actually forgotten about this and this is indeed
probably a difference. This is from the other paper I cited:
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
It's the "WR" in "ALS-WR" -- weighted regularization. I supp
Hi Sean,
Digging deeper I've found another difference between Oryx's implementation
and Spark's. Why do you adjust lambda here?
https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java#L491
Cheers,
Michael
Thank you everyone for your feedback. It's been very helpful, and though I
still haven't found the cause of the difference between Spark and Oryx, I
feel I'm making progress.
Xiangrui asked me to create a ticket for this issue. The reason I didn't do
this originally is because it's not clear to me
On Wed, Mar 12, 2014 at 7:36 AM, Nick Pentreath
wrote:
> @Sean, would it be a good idea to look at changing the regularization in
> Spark's ALS to alpha * lambda? What is the thinking behind this? If I
> recall, the Mahout version added something like (# ratings * lambda) as
> regularization in ea
The mahout implementation is just a straight-forward port of the paper.
No changes have been made.
On 03/12/2014 08:36 AM, Nick Pentreath wrote:
It would be helpful to know what parameter inputs you are using.
If the regularization schemes are different (by a factor of alpha, which
can often b
It would be helpful to know what parameter inputs you are using.
If the regularization schemes are different (by a factor of alpha, which
can often be quite high) this will mean that the same parameter settings
could give very different results. A higher lambda would be required with
Spark's versi
Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
metrics to tell which recommendation is better? -Xiangrui
On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng wrote:
> Hi Michael,
>
> I can help check the current impleme
On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman wrote:
> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> comparison, I've run the training data through Oryx's in-VM implementation
> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> (Source in this
Hi Michael,
I can help check the current implementation. Would you please go to
https://spark-project.atlassian.net/browse/SPARK and create a ticket
about this issue with component "MLlib"? Thanks!
Best,
Xiangrui
On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman wrote:
> Hi,
>
> I'm implementing
Hi,
I'm implementing a recommender based on the algorithm described in
http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
basis for Spark's ALS implementation for data sets with implicit features.
The data set I'm working with is proprietary and I cannot share it,
howe
27 matches
Mail list logo