f ALS becomes very deep, both DAGScheduler and Java serializer may
> overflow because they are implemented in a recursive way. You may resort to
> checkpointing as a workaround.
>
>
> On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li wrote:
>
>> Hi,
>>
>> I am testing A
Hi,
I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS
program cannot run even with a very small size of training data (about 91
lines) due to StackVverFlow error when I set the number of iterations to
100. I think the problem may be caused by updateFeatures method which
up
Hi Guillaume,
Thanks for your explanation. It helps me a lot. I will try it.
Xiaoli
.
>
> For a million users, you should be able to distribute the things needed to
> make a recommendation (either the centroids or the attributes matrix), and
> just break up the work based on the users you want to generate
> recommendations for. I hope this helps.
>
> Tom
Hi Guillaume,
This sounds a good idea to me. I am a newbie here. Could you further
explain how will you determine which clusters to keep? According to the
distance between each element with each cluster center?
Will you keep several clusters for each element for searching nearest
neighbours? Thank
The PR is at https://github.com/apache/spark/pull/336 though it will need
> refactoring given the recent changes to matrix interface in MLlib. You may
> implement the sampling scheme for your own app since it's much code.
>
> Best,
> Reza
>
>
> On Fri, Apr 11, 2014 at 9:17 PM
t that you'll be able to take this approach with the 1T pairs
> though, so it might be worth looking at the literature for recommender
> systems to see what else is out there.
>
>
> On Fri, Apr 11, 2014 at 9:54 PM, Xiaoli Li wrote:
>
>> Hi all,
>>
>> I am impl
Hi all,
I am implementing an algorithm using Spark. I have one million users. I
need to compute the similarity between each pair of users using some user's
attributes. For each user, I need to get top k most similar users. What is
the best way to implement this?
Thanks.