I'm trying to apply Spark to a NLP problem that I'm working around. I have near
4 million tweets text and I have converted them into word vectors. It's pretty
sparse because each message just has dozens of words but the vocabulary has
tens of thousand words.
These vectors should be loaded each t
Well, maybe a Linux configure problem...
I have a cluster that is about to expose to the public, and I want everyone
that uses my cluster owns a user (without permissions of sudo, etc.)(e.g.
'guest'), and is able to submit tasks to Spark, which working on Mesos that
running with a different, pri
I'm trying to solve a Word-Count like problem, the difference lies in that, I
need the count of a specific word among a specific timespan in a social message
stream.
My data is in the format of (time, message), and I transformed (flatMap etc.)
it into a series of (time, word_id), the time is rep
om: yuzhih...@gmail.com
To: lovejay-lovemu...@outlook.com
CC: user@spark.apache.org
Can you show us the function you passed to reduceByKey() ?
What release of Spark are you using ?
Cheers
On Sat, Apr 18, 2015 at 8:17 AM, SecondDatke
wrote:
I'm trying to solve a Word-Count like problem,
> CC: user@spark.apache.org
>
> Do these datetime objects implement a the notion of equality you'd
> expect? (This may be a dumb question; I'm thinking of the equivalent
> of equals() / hashCode() from the Java world.)
>
> On Sat, Apr 18, 2015 at 4:17 PM, Secon
) / hashCode() from the Java world.)
>
> On Sat, Apr 18, 2015 at 4:17 PM, SecondDatke
> wrote:
> > I'm trying to solve a Word-Count like problem, the difference lies in that,
> > I need the count of a specific word among a specific timespan in a social
> > message st
assed to reduceByKey() ?
What release of Spark are you using ?
Cheers
On Sat, Apr 18, 2015 at 8:17 AM, SecondDatke
wrote:
I'm trying to solve a Word-Count like problem, the difference lies in that, I
need the count of a specific word among a specific timespan in a social message
stream.
My