Re: StackOverflow Error when run ALS with 100 iterations

2014-04-15 Thread Xiaoli Li
f ALS becomes very deep, both DAGScheduler and Java serializer may > overflow because they are implemented in a recursive way. You may resort to > checkpointing as a workaround. > > > On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li wrote: > >> Hi, >> >> I am testing A

StackOverflow Error when run ALS with 100 iterations

2014-04-15 Thread Xiaoli Li
Hi, I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS program cannot run even with a very small size of training data (about 91 lines) due to StackVverFlow error when I set the number of iterations to 100. I think the problem may be caused by updateFeatures method which up

Re: Huge matrix

2014-04-14 Thread Xiaoli Li
Hi Guillaume, Thanks for your explanation. It helps me a lot. I will try it. Xiaoli

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
. > > For a million users, you should be able to distribute the things needed to > make a recommendation (either the centroids or the attributes matrix), and > just break up the work based on the users you want to generate > recommendations for. I hope this helps. > > Tom

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
Hi Guillaume, This sounds a good idea to me. I am a newbie here. Could you further explain how will you determine which clusters to keep? According to the distance between each element with each cluster center? Will you keep several clusters for each element for searching nearest neighbours? Thank

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
The PR is at https://github.com/apache/spark/pull/336 though it will need > refactoring given the recent changes to matrix interface in MLlib. You may > implement the sampling scheme for your own app since it's much code. > > Best, > Reza > > > On Fri, Apr 11, 2014 at 9:17 PM

Re: Huge matrix

2014-04-11 Thread Xiaoli Li
t that you'll be able to take this approach with the 1T pairs > though, so it might be worth looking at the literature for recommender > systems to see what else is out there. > > > On Fri, Apr 11, 2014 at 9:54 PM, Xiaoli Li wrote: > >> Hi all, >> >> I am impl

Huge matrix

2014-04-11 Thread Xiaoli Li
Hi all, I am implementing an algorithm using Spark. I have one million users. I need to compute the similarity between each pair of users using some user's attributes. For each user, I need to get top k most similar users. What is the best way to implement this? Thanks.