Re: implicit ALS dataSet

redocpot Thu, 19 Jun 2014 07:45:36 -0700

One thing needs to be mentioned is that, in fact, the schema is (userId,
itemId, nbPurchase), where nbPurchase is equivalent to ratings. I found that
there are many one-timers, which means the pairs whose nbPurchase = 1. The
number of these pairs is about 85% of all positive observations.


As the paper said, the low ratings will get a low confidence weight, so if I
understand correctly, these dominant one-timers will be more *unlikely* to
be recommended comparing to other items whose nbPurchase is bigger.

In fact, lambda is also considered as a potential problem, as in our case,
the lambda is set to 300, which is confirmed by the test set. Here is test
result :

*lambda = 65
EPR_in  = 0.06518592593142056
EPR_out = 0.14789338884259276

lambda = 100
EPR_in  = 0.06619274171311466
EPR_out = 0.13494609978226865

lambda = 300
EPR_in  = 0.08814703345418627
EPR_out = 0.09522125434156471*

where EPR_in is given by training set and EPR_out is given by test set. It
seems 300 is the right lambda, since less overfitting.

Some other parameters are showed in the following code :

*val model = new ALS()
      .setImplicitPrefs(implicitPrefs = true)
      .setAlpha(1) 
      .setLambda(300)
      .setRank(50)
      .setIterations(40)
      .setBlocks(8)
      .setSeed(42)
      .run(ratings_train)*

we set Alpha to 1, since the max nbPurchase is 1396. Not sure if Alpha is
already too big.

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/implicit-ALS-dataSet-tp7067p7916.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: implicit ALS dataSet

Reply via email to