Let's narrow the context from matrix factorization to recommendation via ALS. It adds extra complexity if we treat it as a multi-class classification problem. ALS only outputs a single value for each prediction, which is hard to convert to probability distribution over the 5 rating levels. Treating it as a binary classification problem or a ranking problem does make sense. The RankingMetricc is in master. Free free to add prec@k and ndcg@k to examples.MovielensALS. ROC should be good to add as well. -Xiangrui
On Wed, Oct 29, 2014 at 11:23 AM, Debasish Das <debasish.da...@gmail.com> wrote: > Hi, > > In the current factorization flow, we cross validate on the test dataset > using the RMSE number but there are some other measures which are worth > looking into. > > If we consider the problem as a regression problem and the ratings 1-5 are > considered as 5 classes, it is possible to generate a confusion matrix > using MultiClassMetrics.scala > > If the ratings are only 0/1 (like from the spotify demo from spark summit) > then it is possible to use Binary Classification Metrices to come up with > the ROC curve... > > For topK user/products we should also look into prec@k and pdcg@k as the > metric.. > > Does it make sense to add the multiclass metric and prec@k, pdcg@k in > examples.MovielensALS along with RMSE ? > > Thanks. > Deb --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org