from:"Huy Banh"

Unsubscribe

2019-01-18 Thread Huy Banh

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

2015-09-22 Thread Huy Banh

The header should be sent from driver to workers already by spark. And therefore in sparkshell it works. In scala IDE, the code inside an app class. Then you need to check if the app class is serializable. On Tue, Sep 22, 2015 at 9:13 AM Alexis Gillain < alexis.gill...@googlemail.com> wrote: >

Re: word count (group by users) in spark

2015-09-21 Thread Huy Banh

you please explain... > > Thanks > Sri > > > > On Mon, Sep 21, 2015 at 6:13 AM, Huy Banh wrote: > >> Hi, >> >> If your input format is user -> comment, then you could: >> >> val comments = sc.parallelize(List(("u1", "on

Re: word count (group by users) in spark

2015-09-20 Thread Huy Banh

Hi, If your input format is user -> comment, then you could: val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three four three"))) val wordCounts = comments. flatMap({case (user, comment) => for (word <- comment.split(" ")) yield(((user, word), 1)) }). reduceByKey(_

Re: Using Spark for portfolio manager app

2015-09-19 Thread Huy Banh

Hi Thuy, You can check Rdd.lookup(). It requires the rdd is partitioned, and of course, cached in memory. Or you may consider a distributed cache like ehcache, aws elastic cache. I think an external storage is an option, too. Especially nosql databases, they can handle updates at high speed, at c

Kr

2015-09-09 Thread Huy Banh

Ọqo

Unsubscribe

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

Re: word count (group by users) in spark

Re: word count (group by users) in spark

Re: Using Spark for portfolio manager app

Kr

6 matches

Site Navigation

Mail list logo

Footer information