Hi, 

The real-world dataset is a bit more large, so I tested on the MovieLens
data set, and find the same results:


        alpha
        lambda 
        rank
        top1
        top5
        EPR_in
        EPR_out


        40
        0.001 
        50
        297
        559
        0.05855
        0.17299



        40
        0.01 
        50
        295
        559
        0.05854
        0.17298


        40
        0.1 
        50
        296
        560
        0.05846
        0.17287


        40
        1 
        50
        309
        564
        0.05819
        0.17227


        40
        25 
        50
        287
        537
        0.05699
        0.14855


        40
        50 
        50
        267
        496
        0.05795
        0.13389


        40
        100 
        50
        247
        444
        0.06504
        0.11920


        40
        200 
        50
        145
        306
        0.09558
        0.11388


        40
        300 
        50
        77
        178
        0.11340
        0.12264



To be clear, there are 1650 items in this movielens data set. Top 1 and Top
5 in the table means the nb of diff items on top1 and top5 according to the
preference list for each user after ALS do the work. Top1, top5, EPR_in are
based on training set. Only EPR_out is on test set. In the top1 and top5,
all items are taken into account, no matter whether it is purchased or not.

The table shows that small lambda( < 1) always leads to over fitting, while
big lambda like 300 removes over fitting but the nb of diff items on the top
1 and top 5 of the preference list is very small (not personalized).





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067p8115.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to