Hi,
The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:
alpha
lambda
rank
top1
top5
EPR_in
EPR_out
40
0.001
50
297
559
0.05855
On Thu, Jun 19, 2014 at 3:44 PM, redocpot wrote:
> As the paper said, the low ratings will get a low confidence weight, so if I
> understand correctly, these dominant one-timers will be more *unlikely* to
> be recommended comparing to other items whose nbPurchase is bigger.
Correct, yes.
> In f
One thing needs to be mentioned is that, in fact, the schema is (userId,
itemId, nbPurchase), where nbPurchase is equivalent to ratings. I found that
there are many one-timers, which means the pairs whose nbPurchase = 1. The
number of these pairs is about 85% of all positive observations.
As the p
On Thu, Jun 19, 2014 at 3:03 PM, redocpot wrote:
> We did some sanity check. For example, each user has his own item list which
> is sorted by preference, then we just pick the top 10 items for each user.
> As a result, we found that there were only 169 different items among the
> (1060080 x 10) i
Hi,
Recently, I have launched a implicit ALS test on a real-world data set.
Initially, we have 2 data set, one is the purchase record during 3 years
past (training set), and the other is the one during 6 months just after the
3 years (test set)
It's a database with 1060080 user and 23880 items.
On Thu, Jun 5, 2014 at 10:38 PM, redocpot wrote:
> can be simplified by taking advantage of its algebraic structure, so
> negative observations are not needed. This is what I think at the first time
> I read the paper.
Correct, a big part of the reason that is efficient is because of
sparsity of
Thank you for your quick reply.
As far as I know, the update does not require negative observations, because
the update rule
Xu = (YtCuY + λI)^-1 Yt Cu P(u)
can be simplified by taking advantage of its algebraic structure, so
negative observations are not needed. This is what I think at the firs
The paper definitely does not suggest that you should include every
user-item pair in the input. The input is by nature extremely sparse,
so literally filling in all the 0s in the input would create
overwhelmingly large input. No, there is no need to do it and it would
be terrible for performance.