+1
On Mar 21, 2016 09:52, "Hiroyuki Yamada" <mogwa...@gmail.com> wrote:

> Could anyone give me some advices or recommendations or usual ways to do
> this ?
>
> I am trying to get all (probably top 100) product recommendations for each
> user from a model (MatrixFactorizationModel),
> but I haven't figured out yet to do it efficiently.
>
> So far,
> calling predict (predictAll in pyspark) method with user-product matrix
> uses too much memory and couldn't complete due to a lack of memory,
> and
> calling predict for each user (or for each some users like 100 uses or so)
> takes too much time to get all the recommendations.
>
> I am using spark 1.4.1 and running 5-node cluster with 8GB RAM each.
> I only use small-sized data set so far, like about 50000 users and 5000
> products with only about 100000 ratings.
>
> Thanks.
>
>
> On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <mogwa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm testing Collaborative Filtering with Milib.
>> Making a model by ALS.trainImplicit (or train) seems scalable as far as I
>> have tested,
>> but I'm wondering how I can get all the recommendation results
>> efficiently.
>>
>> The predictAll method can get all the results,
>> but it needs the whole user-product matrix in memory as an input.
>> So if there are 1 million users and 1 million products, then the number
>> of elements is too large (1 million x 1 million)
>> and the amount of memory to hold them is more than a few TB even when the
>> element size in only 4B,
>> which is not a realistic size of memory even now.
>>
>> # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB)
>>
>> We can, of course, use predict method per user,
>> but, as far as I tried, it is very slow to get 1 million users' results.
>>
>> Do I miss something ?
>> Are there any other better ways to get all the recommendation results in
>> scalable and efficient way ?
>>
>> Best regards,
>> Hiro
>>
>>
>>
>

Reply via email to