Re: matrix factorization cross validation

2014-11-03 Thread Debasish Das
I added the drivers for precisionAt(k: Int) driver for the movielens test-cases...Although I am a bit confused on precisionAt(k: Int) code from RankingMetrics.scala... While cross validating, I am really not sure how to set K... if (labSet.nonEmpty) { val n = math.min(pred.length, k) ... } If I

Re: matrix factorization cross validation

2014-10-31 Thread Sean Owen
No, excepting approximate methods like LSH to figure out the relatively small set of candidates for the users in the partition, and broadcast or join those. On Fri, Oct 31, 2014 at 5:45 AM, Nick Pentreath wrote: > Sean, re my point earlier do you know a more efficient way to compute top k > for e

Re: matrix factorization cross validation

2014-10-30 Thread Nick Pentreath
Sean, re my point earlier do you know a more efficient way to compute top k for each user, other than to broadcast the item factors?  (I guess one can use the new asymmetric lsh paper perhaps to assist) — Sent from Mailbox On Thu, Oct 30, 2014 at 11:24 PM, Sean Owen wrote: > MAP is effectiv

Re: matrix factorization cross validation

2014-10-30 Thread Sean Owen
MAP is effectively an average over all k from 1 to min(# recommendations, # items rated) Getting first recommendations right is more important than the last. On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das wrote: > Does it make sense to have a user specific K or K is considered same over > all use

Re: matrix factorization cross validation

2014-10-30 Thread Debasish Das
Does it make sense to have a user specific K or K is considered same over all users ? Intuitively the users who watches more movies should get a higher K than the others... On Thu, Oct 30, 2014 at 2:15 PM, Sean Owen wrote: > The pretty standard metric for recommenders is mean average precision,

Re: matrix factorization cross validation

2014-10-30 Thread Sean Owen
The pretty standard metric for recommenders is mean average precision, and RankingMetrics will already do that as-is. I don't know that a confusion matrix for this binary classification does much. On Thu, Oct 30, 2014 at 9:41 PM, Debasish Das wrote: > I am working on it...I will open up a JIRA o

Re: matrix factorization cross validation

2014-10-30 Thread Debasish Das
I am working on it...I will open up a JIRA once I see some results.. Idea is to come up with a test train set based on users...basically for each user, we come up with 80% train data and 20% test data... Now we pick up a K (each user should have a different K based on the movies he watched so som

Re: matrix factorization cross validation

2014-10-30 Thread Debasish Das
I thought topK will save us...for each user we have 1xrank...now our movie factor is a RDD...we pick topK movie factors based on vector norm...with K = 50, we will have 50 vectors * num_executors in a RDD...with the user 1xrank we do a distributed dot product using RowMatrix APIs... May be we can'

Re: matrix factorization cross validation

2014-10-30 Thread Nick Pentreath
Looking at https://github.com/apache/spark/blob/814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L82 For each user in test set, you generate an Array of top K predicted item ids (Int or String probably), and an Array of ground tru

Re: matrix factorization cross validation

2014-10-29 Thread Debasish Das
Is there an example of how to use RankingMetrics ? Let's take the user, document example...we get user x topic and document x topic matrices as the model... Now for each user, we can generate topK document by doing a sort on (1 x topic)dot(topic x document) and picking topK... Is it possible to

Re: matrix factorization cross validation

2014-10-29 Thread Debasish Das
Makes sense for the binary and ranking problem but for example linear regression for multi-class also optimizes on RMSE but we still measure the prediction efficiency using some measure on confusion matrix...Is not the same idea should hold for ALS as well ? On Wed, Oct 29, 2014 at 12:14 PM, Xian

Re: matrix factorization cross validation

2014-10-29 Thread Xiangrui Meng
Let's narrow the context from matrix factorization to recommendation via ALS. It adds extra complexity if we treat it as a multi-class classification problem. ALS only outputs a single value for each prediction, which is hard to convert to probability distribution over the 5 rating levels. Treating

matrix factorization cross validation

2014-10-29 Thread Debasish Das
Hi, In the current factorization flow, we cross validate on the test dataset using the RMSE number but there are some other measures which are worth looking into. If we consider the problem as a regression problem and the ratings 1-5 are considered as 5 classes, it is possible to generate a confu