o LSH with 30,000 items.
>>>>>>
>>>>>> —
>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>>>>
g-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>> Keep in mind, you're actually doing a ton of math. Even with proper
>>> caching and use of broadcast variables this will take a while defending on
>>> the size of your cluster. To get real
—
>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>>>>>> ilya.gane...@capitalone.com> wrote:
>>>>>>
&g
t;>>>
>>>>>> Actually talk about this exact thing in a blog post here
>>>>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>>>>> Keep in mind, you're ac
t; sensitive hashing to limit your search space and definitely look into
>> spinning up multiple threads to process your product features in parallel to
>> increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
gt;>> Sent from Mailbox
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya
>>>>>>>> wrote:
>>>>>>>> Actually talk about this exact thing in a blog post here
>>&
earch space and definitely look into
>> spinning up multiple threads to process your product features in parallel to
>> increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
>>
>>
>> -Origi
cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>>>> Keep in mind, you're actually doing a ton of math. Even with proper
>>>>> caching
>>>>> and use of broadcast variables this will take a whil
g
>>>> and use of broadcast variables this will take a while defending on the size
>>>> of your cluster. To get real results you may want to look into locality
>>>> sensitive hashing to limit your search space and definitely look into
>>>> spinning up m
space and definitely look into
>>> spinning up multiple threads to process your product features in parallel
>>> to increase resource utilization on the cluster.
>>>
>>>
>>>
>>> Thank you,
>>> Ilya Ganelin
>>>
>>>
>
;
>
> Thank you,
> Ilya Ganelin
>
>
>
> -Original Message-----
> From: afarahat [ayman.fara...@yahoo.com]
> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> To: user@spark.apache.org
> Subject: Matrix Multiplication and mllib.recommendation
look into
>> spinning up multiple threads to process your product features in parallel
>> to increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
>>
>>
>> -----Original Message-
>> *Fr
> increase resource utilization on the cluster.
> Thank you,
> Ilya Ganelin
> -Original Message-
> From: afarahat [ayman.fara...@yahoo.com<mailto:ayman.fara...@yahoo.com>]
> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> To: user@spark.apache.org
&g
n the cluster.
Thank you,
Ilya Ganelin
-Original Message-
From: afarahat [ayman.fara...@yahoo.com<mailto:ayman.fara...@yahoo.com>]
Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Matrix Multiplication and mllib.recommendation
Hello;
Hello;
I am trying to get predictions after running the ALS model.
The model works fine. In the prediction/recommendation , I have about 30
,000 products and 90 Millions users.
When i try the predict all it fails.
I have been trying to formulate the problem as a Matrix multiplication where
I fi
15 matches
Mail list logo