+1 On Mar 21, 2016 09:52, "Hiroyuki Yamada" <mogwa...@gmail.com> wrote:
> Could anyone give me some advices or recommendations or usual ways to do > this ? > > I am trying to get all (probably top 100) product recommendations for each > user from a model (MatrixFactorizationModel), > but I haven't figured out yet to do it efficiently. > > So far, > calling predict (predictAll in pyspark) method with user-product matrix > uses too much memory and couldn't complete due to a lack of memory, > and > calling predict for each user (or for each some users like 100 uses or so) > takes too much time to get all the recommendations. > > I am using spark 1.4.1 and running 5-node cluster with 8GB RAM each. > I only use small-sized data set so far, like about 50000 users and 5000 > products with only about 100000 ratings. > > Thanks. > > > On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <mogwa...@gmail.com> > wrote: > >> Hi, >> >> I'm testing Collaborative Filtering with Milib. >> Making a model by ALS.trainImplicit (or train) seems scalable as far as I >> have tested, >> but I'm wondering how I can get all the recommendation results >> efficiently. >> >> The predictAll method can get all the results, >> but it needs the whole user-product matrix in memory as an input. >> So if there are 1 million users and 1 million products, then the number >> of elements is too large (1 million x 1 million) >> and the amount of memory to hold them is more than a few TB even when the >> element size in only 4B, >> which is not a realistic size of memory even now. >> >> # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB) >> >> We can, of course, use predict method per user, >> but, as far as I tried, it is very slow to get 1 million users' results. >> >> Do I miss something ? >> Are there any other better ways to get all the recommendation results in >> scalable and efficient way ? >> >> Best regards, >> Hiro >> >> >> >