Re: implicit ALS dataSet

2014-06-23 Thread redocpot
Hi, The real-world dataset is a bit more large, so I tested on the MovieLens data set, and find the same results: alpha lambda rank top1 top5 EPR_in EPR_out 40 0.001 50 297 559 0.05855

Re: implicit ALS dataSet

2014-06-19 Thread Sean Owen
On Thu, Jun 19, 2014 at 3:44 PM, redocpot wrote: > As the paper said, the low ratings will get a low confidence weight, so if I > understand correctly, these dominant one-timers will be more *unlikely* to > be recommended comparing to other items whose nbPurchase is bigger. Correct, yes. > In f

Re: implicit ALS dataSet

2014-06-19 Thread redocpot
One thing needs to be mentioned is that, in fact, the schema is (userId, itemId, nbPurchase), where nbPurchase is equivalent to ratings. I found that there are many one-timers, which means the pairs whose nbPurchase = 1. The number of these pairs is about 85% of all positive observations. As the p

Re: implicit ALS dataSet

2014-06-19 Thread Sean Owen
On Thu, Jun 19, 2014 at 3:03 PM, redocpot wrote: > We did some sanity check. For example, each user has his own item list which > is sorted by preference, then we just pick the top 10 items for each user. > As a result, we found that there were only 169 different items among the > (1060080 x 10) i

Re: implicit ALS dataSet

2014-06-19 Thread redocpot
Hi, Recently, I have launched a implicit ALS test on a real-world data set. Initially, we have 2 data set, one is the purchase record during 3 years past (training set), and the other is the one during 6 months just after the 3 years (test set) It's a database with 1060080 user and 23880 items.

Re: implicit ALS dataSet

2014-06-05 Thread Sean Owen
On Thu, Jun 5, 2014 at 10:38 PM, redocpot wrote: > can be simplified by taking advantage of its algebraic structure, so > negative observations are not needed. This is what I think at the first time > I read the paper. Correct, a big part of the reason that is efficient is because of sparsity of

Re: implicit ALS dataSet

2014-06-05 Thread redocpot
Thank you for your quick reply. As far as I know, the update does not require negative observations, because the update rule Xu = (YtCuY + λI)^-1 Yt Cu P(u) can be simplified by taking advantage of its algebraic structure, so negative observations are not needed. This is what I think at the firs

Re: implicit ALS dataSet

2014-06-05 Thread Sean Owen
The paper definitely does not suggest that you should include every user-item pair in the input. The input is by nature extremely sparse, so literally filling in all the 0s in the input would create overwhelmingly large input. No, there is no need to do it and it would be terrible for performance.