Ashutosh,
A vector would be a good idea vectors are used very frequently.
Test data is usually stored in the spark/data/mllib folder
On Oct 30, 2014 10:31 PM, "Ashutosh [via Apache Spark Developers List]" <
ml-node+s1001551n9034...@n3.nabble.com> wrote:
> Hi Anant,
> sorry for my late reply. Than
Sean, re my point earlier do you know a more efficient way to compute top k for
each user, other than to broadcast the item factors?
(I guess one can use the new asymmetric lsh paper perhaps to assist)
—
Sent from Mailbox
On Thu, Oct 30, 2014 at 11:24 PM, Sean Owen wrote:
> MAP is effectiv
A?lready done. Here is the link
https://issues.apache.org/jira/browse/SPARK-4038
From: slcclimber [via Apache Spark Developers List]
Sent: Friday, October 31, 2014 10:09 AM
To: Ashutosh Trivedi (MT2013030)
Subject: Re: [MLlib] Contributing Algorithm for Outlier
?Okay. I'll try it and post it soon with test case. After that I think we can
go ahead with the PR.
From: slcclimber [via Apache Spark Developers List]
Sent: Friday, October 31, 2014 10:03 AM
To: Ashutosh Trivedi (MT2013030)
Subject: Re: [MLlib] Contributing Alg
Hi Anant,
sorry for my late reply. Thank you for taking time and reviewing it.
I have few comments on first issue.
You are correct on the string (csv) part. But we can not take input of type
you mentioned. We calculate frequency in our function. Otherwise user has to
do all this computation. I r
MAP is effectively an average over all k from 1 to min(#
recommendations, # items rated) Getting first recommendations right is
more important than the last.
On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das wrote:
> Does it make sense to have a user specific K or K is considered same over
> all use
Does it make sense to have a user specific K or K is considered same over
all users ?
Intuitively the users who watches more movies should get a higher K than
the others...
On Thu, Oct 30, 2014 at 2:15 PM, Sean Owen wrote:
> The pretty standard metric for recommenders is mean average precision,
The pretty standard metric for recommenders is mean average precision,
and RankingMetrics will already do that as-is. I don't know that a
confusion matrix for this binary classification does much.
On Thu, Oct 30, 2014 at 9:41 PM, Debasish Das wrote:
> I am working on it...I will open up a JIRA o
vHi,
I've been exploring the metrics exposed by Spark and I'm wondering whether
there's a way to register job-specific metrics that could be exposed
through the existing metrics system.
Would there be an example somewhere?
BTW, documentation about how the metrics work could be improved. I found
I am working on it...I will open up a JIRA once I see some results..
Idea is to come up with a test train set based on users...basically for
each user, we come up with 80% train data and 20% test data...
Now we pick up a K (each user should have a different K based on the movies
he watched so som
Some of our tests actually require spinning up a small multi-process
spark cluster. These use the normal deployment codepath for Spark
which is that we rely on the spark "assembly jar" to be present. That
jar is generated when you run "mvn package" via a special sub project
called assembly in our b
Multiline support (much shinier than :paste), smart completion and things that
an IDE makes easy or better (without any hassle). In particular, fast switching
between REPL and editor while staying in the same screen makes me even more
productive.
Nabeel
> On Oct 30, 2014, at 9:39 AM, Stephen
I thought topK will save us...for each user we have 1xrank...now our movie
factor is a RDD...we pick topK movie factors based on vector norm...with K
= 50, we will have 50 vectors * num_executors in a RDD...with the user
1xrank we do a distributed dot product using RowMatrix APIs...
May be we can'
HI Nabeel,
In what ways is the IJ version of scala repl enhanced? thx!
2014-10-30 3:41 GMT-07:00 :
> IntelliJ idea scala plugin comes with an enhanced REPL. It's a pretty
> decent option too.
>
> Nabeel
>
> > On Oct 28, 2014, at 5:34 AM, Cheng Lian wrote:
> >
> > My two cents for Mac Vim/Emac
You are right that this is a bit weird compared to the Maven lifecycle
semantics. Maven wants assembly to come after tests but here tests want to
launch the final assembly as part of some tests. Yes you would not normally
have to do this in 2 stages.
On Oct 30, 2014 12:28 PM, "Niklas Wilcke" <1wil.
Can you please briefly explain why packaging is necessary. I thought
packaging would only build the jar and place it in the target folder.
How does that affect the tests? If tests depend on the assembly a "mvn
install" would be more sensible to me.
Probably I misunderstand the maven build life-cycl
IntelliJ idea scala plugin comes with an enhanced REPL. It's a pretty decent
option too.
Nabeel
> On Oct 28, 2014, at 5:34 AM, Cheng Lian wrote:
>
> My two cents for Mac Vim/Emacs users. Fixed a Scala ctags Mac compatibility
> bug months ago, and you may want to use the most recent version he
Looking at
https://github.com/apache/spark/blob/814a9cd7fabebf2a06f7e2e5d46b6a2b28b917c2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L82
For each user in test set, you generate an Array of top K predicted item
ids (Int or String probably), and an Array of ground tru
18 matches
Mail list logo