Hi,
The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:
alpha
lambda
rank
top1
top5
EPR_in
EPR_out
40
0.001
50
297
559
0.05855
0.17299
40
0.01
50
295
559
0.05854
0.17298
40
0.1
50
296
560
0.05846
0.17287
40
1
50
309
564
0.05819
0.17227
40
25
50
287
537
0.05699
0.14855
40
50
50
267
496
0.05795
0.13389
40
100
50
247
444
0.06504
0.11920
40
200
50
145
306
0.09558
0.11388
40
300
50
77
178
0.11340
0.12264
To be clear, there are 1650 items in this movielens data set. Top 1 and Top
5 in the table means the nb of diff items on top1 and top5 according to the
preference list for each user after ALS do the work. Top1, top5, EPR_in are
based on training set. Only EPR_out is on test set. In the top1 and top5,
all items are taken into account, no matter whether it is purchased or not.
The table shows that small lambda( < 1) always leads to over fitting, while
big lambda like 300 removes over fitting but the nb of diff items on the top
1 and top 5 of the preference list is very small (not personalized).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067p8115.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.