Unsubscribe

2019-01-18 Thread Huy Banh

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

2015-09-22 Thread Huy Banh
The header should be sent from driver to workers already by spark. And therefore in sparkshell it works. In scala IDE, the code inside an app class. Then you need to check if the app class is serializable. On Tue, Sep 22, 2015 at 9:13 AM Alexis Gillain < alexis.gill...@googlemail.com> wrote: >

Re: word count (group by users) in spark

2015-09-21 Thread Huy Banh
you please explain... > > Thanks > Sri > > > > On Mon, Sep 21, 2015 at 6:13 AM, Huy Banh wrote: > >> Hi, >> >> If your input format is user -> comment, then you could: >> >> val comments = sc.parallelize(List(("u1", "on

Re: word count (group by users) in spark

2015-09-20 Thread Huy Banh
Hi, If your input format is user -> comment, then you could: val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three four three"))) val wordCounts = comments. flatMap({case (user, comment) => for (word <- comment.split(" ")) yield(((user, word), 1)) }). reduceByKey(_

Re: Using Spark for portfolio manager app

2015-09-19 Thread Huy Banh
Hi Thuy, You can check Rdd.lookup(). It requires the rdd is partitioned, and of course, cached in memory. Or you may consider a distributed cache like ehcache, aws elastic cache. I think an external storage is an option, too. Especially nosql databases, they can handle updates at high speed, at c

Kr

2015-09-09 Thread Huy Banh
Ọqo