The header should be sent from driver to workers already by spark. And
therefore in sparkshell it works.
In scala IDE, the code inside an app class. Then you need to check if the
app class is serializable.
On Tue, Sep 22, 2015 at 9:13 AM Alexis Gillain <
alexis.gill...@googlemail.com> wrote:
>
you please explain...
>
> Thanks
> Sri
>
>
>
> On Mon, Sep 21, 2015 at 6:13 AM, Huy Banh wrote:
>
>> Hi,
>>
>> If your input format is user -> comment, then you could:
>>
>> val comments = sc.parallelize(List(("u1", "on
Hi,
If your input format is user -> comment, then you could:
val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three
four three")))
val wordCounts = comments.
flatMap({case (user, comment) =>
for (word <- comment.split(" ")) yield(((user, word), 1)) }).
reduceByKey(_
Hi Thuy,
You can check Rdd.lookup(). It requires the rdd is partitioned, and of
course, cached in memory. Or you may consider a distributed cache like
ehcache, aws elastic cache.
I think an external storage is an option, too. Especially nosql databases,
they can handle updates at high speed, at c
Ọqo