I added the drivers for precisionAt(k: Int) driver for the movielens
test-cases...Although I am a bit confused on precisionAt(k: Int) code from
RankingMetrics.scala...
While cross validating, I am really not sure how to set K...
if (labSet.nonEmpty) { val n = math.min(pred.length, k) ... }
If I
No, excepting approximate methods like LSH to figure out the
relatively small set of candidates for the users in the partition, and
broadcast or join those.
On Fri, Oct 31, 2014 at 5:45 AM, Nick Pentreath
wrote:
> Sean, re my point earlier do you know a more efficient way to compute top k
> for e
Sean, re my point earlier do you know a more efficient way to compute top k for
each user, other than to broadcast the item factors?
(I guess one can use the new asymmetric lsh paper perhaps to assist)
—
Sent from Mailbox
On Thu, Oct 30, 2014 at 11:24 PM, Sean Owen wrote:
> MAP is effectiv
MAP is effectively an average over all k from 1 to min(#
recommendations, # items rated) Getting first recommendations right is
more important than the last.
On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das wrote:
> Does it make sense to have a user specific K or K is considered same over
> all use
Does it make sense to have a user specific K or K is considered same over
all users ?
Intuitively the users who watches more movies should get a higher K than
the others...
On Thu, Oct 30, 2014 at 2:15 PM, Sean Owen wrote:
> The pretty standard metric for recommenders is mean average precision,
The pretty standard metric for recommenders is mean average precision,
and RankingMetrics will already do that as-is. I don't know that a
confusion matrix for this binary classification does much.
On Thu, Oct 30, 2014 at 9:41 PM, Debasish Das wrote:
> I am working on it...I will open up a JIRA o
I am working on it...I will open up a JIRA once I see some results..
Idea is to come up with a test train set based on users...basically for
each user, we come up with 80% train data and 20% test data...
Now we pick up a K (each user should have a different K based on the movies
he watched so som
I thought topK will save us...for each user we have 1xrank...now our movie
factor is a RDD...we pick topK movie factors based on vector norm...with K
= 50, we will have 50 vectors * num_executors in a RDD...with the user
1xrank we do a distributed dot product using RowMatrix APIs...
May be we can'
Looking at
https://github.com/apache/spark/blob/814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L82
For each user in test set, you generate an Array of top K predicted item
ids (Int or String probably), and an Array of ground tru
Is there an example of how to use RankingMetrics ?
Let's take the user, document example...we get user x topic and document x
topic matrices as the model...
Now for each user, we can generate topK document by doing a sort on (1 x
topic)dot(topic x document) and picking topK...
Is it possible to
Makes sense for the binary and ranking problem but for example linear
regression for multi-class also optimizes on RMSE but we still measure the
prediction efficiency using some measure on confusion matrix...Is not the
same idea should hold for ALS as well ?
On Wed, Oct 29, 2014 at 12:14 PM, Xian
Let's narrow the context from matrix factorization to recommendation
via ALS. It adds extra complexity if we treat it as a multi-class
classification problem. ALS only outputs a single value for each
prediction, which is hard to convert to probability distribution over
the 5 rating levels. Treating
12 matches
Mail list logo