I am using Spark 1.2.1. Thank you Krishna, I am getting almost the same results as you so it must be an error in the tutorial. Xiangrui, I made some additional tests with lambda to 0.1 and I am getting a much better rmse:
RMSE (validation) = 0.868981 for the model trained with rank = 8, lambda = 0.1, and numIter = 10. RMSE (validation) = 0.869628 for the model trained with rank = 8, lambda = 0.1, and numIter = 20. RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = 1.0, and numIter = 10. RMSE (validation) = 1.361321 for the model trained with rank = 8, lambda = 1.0, and numIter = 20. RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = 10.0, and numIter = 10. RMSE (validation) = 3.755870 for the model trained with rank = 8, lambda = 10.0, and numIter = 20. RMSE (validation) = 0.866605 for the model trained with rank = 12, lambda = 0.1, and numIter = 10. RMSE (validation) = 0.867498 for the model trained with rank = 12, lambda = 0.1, and numIter = 20. RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda = 1.0, and numIter = 10. RMSE (validation) = 1.361321 for the model trained with rank = 12, lambda = 1.0, and numIter = 20. RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda = 10.0, and numIter = 10. RMSE (validation) = 3.755870 for the model trained with rank = 12, lambda = 10.0, and numIter = 20. The best model was trained with rank = 12 and lambda = 0.1, and numIter = 10, and its RMSE on the test set is 0.865407. On Tue, Feb 24, 2015 at 7:23 AM, Xiangrui Meng <men...@gmail.com> wrote: > Try to set lambda to 0.1. -Xiangrui > > On Mon, Feb 23, 2015 at 3:06 PM, Krishna Sankar <ksanka...@gmail.com> > wrote: > > The RSME varies a little bit between the versions. > > Partitioned the training,validation,test set like so: > > > > training = ratings_rdd_01.filter(lambda x: (x[3] % 10) < 6) > > validation = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 6 and (x[3] % > > 10) < 8) > > test = ratings_rdd_01.filter(lambda x: (x[3] % 10) >= 8) > > Validation MSE : > > > > # 1.3.0 Mean Squared Error = 0.871456869392 > > # 1.2.1 Mean Squared Error = 0.877305629074 > > > > Itertools results: > > > > 1.3.0 - RSME = 1.354839 (rank = 8 and lambda = 1.0, and numIter = 20) > > 1.1.1 - RSME = 1.335831 (rank = 8 and lambda = 1.0, and numIter = 10) > > > > Cheers > > <k/> > > > > On Mon, Feb 23, 2015 at 12:37 PM, Xiangrui Meng <men...@gmail.com> > wrote: > >> > >> Which Spark version did you use? Btw, there are three datasets from > >> MovieLens. The tutorial used the medium one (1 million). -Xiangrui > >> > >> On Mon, Feb 23, 2015 at 8:36 AM, poiuytrez <guilla...@databerries.com> > >> wrote: > >> > What do you mean? > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > > http://apache-spark-user-list.1001560.n3.nabble.com/Movie-Recommendation-tutorial-tp21769p21771.html > >> > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > >