Re: Map one RDD into two RDD

2015-05-08 Thread ayan guha
earn the Parallel Programming Model of an OO >>> Framework like Spark – in any OO Framework lots of Behavior is hidden / >>> encapsulated by the Framework and the client code gets invoked at specific >>> points in the Flow of Control / Data based on callback functions

Re: Map one RDD into two RDD

2015-05-08 Thread anshu shukla
t;> >> That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to >> you but it is not >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q@gmail.com] >> *Sent:* Thursday, May 7, 2015 6:27 PM >> >> *To:* Evo Eftimov >

Re: Map one RDD into two RDD

2015-05-07 Thread anshu shukla
in the Flow of Control / Data based on callback functions > > > > That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to > you but it is not > > > > > > *From:* Bill Q [mailto:bill.q@gmail.com] > *Sent:* Thursday, May 7, 2015 6:27 PM > > *T

RE: Map one RDD into two RDD

2015-05-07 Thread Evo Eftimov
: Bill Q [mailto:bill.q@gmail.com] Sent: Thursday, May 7, 2015 6:27 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: Map one RDD into two RDD The multi-threading code in Scala is quite simple and you can google it pretty easily. We used the Future framework. You can use Akka also

Re: Map one RDD into two RDD

2015-05-07 Thread Gerard Maas
n Parallel Pipelines / DAGs within the Spark Framework >> >> RDD1 = RDD.filter() >> >> RDD2 = RDD.filter() >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q@gmail.com] >> *Sent:* Thursday, May 7, 2015 4:55 PM >> *To:* Evo Eftimov &

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
ipelines / DAGs within the Spark Framework > > RDD1 = RDD.filter() > > RDD2 = RDD.filter() > > > > > > *From:* Bill Q [mailto:bill.q@gmail.com > ] > *Sent:* Thursday, May 7, 2015 4:55 PM > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > > *Subject:*

RE: Map one RDD into two RDD

2015-05-07 Thread Evo Eftimov
: Bill Q [mailto:bill.q@gmail.com] Sent: Thursday, May 7, 2015 4:55 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: Map one RDD into two RDD Thanks for the replies. We decided to use concurrency in Scala to do the two mappings using the same source RDD in parallel. So far, it

Re: Map one RDD into two RDD

2015-05-07 Thread Gerard Maas
Hi Bill, Could you show a snippet of code to illustrate your choice? -Gerard. On Thu, May 7, 2015 at 5:55 PM, Bill Q wrote: > Thanks for the replies. We decided to use concurrency in Scala to do the > two mappings using the same source RDD in parallel. So far, it seems to be > working. Any com

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
Thanks for the replies. We decided to use concurrency in Scala to do the two mappings using the same source RDD in parallel. So far, it seems to be working. Any comments? On Wednesday, May 6, 2015, Evo Eftimov wrote: > RDD1 = RDD.filter() > > RDD2 = RDD.filter() > > > > *From:* Bill Q [mailto:bi

RE: Map one RDD into two RDD

2015-05-06 Thread Evo Eftimov
RDD1 = RDD.filter() RDD2 = RDD.filter() From: Bill Q [mailto:bill.q@gmail.com] Sent: Tuesday, May 5, 2015 10:42 PM To: user@spark.apache.org Subject: Map one RDD into two RDD Hi all, I have a large RDD that I map a function to it. Based on the nature of each record in the input RDD,

Re: Map one RDD into two RDD

2015-05-05 Thread Ted Yu
Have you looked at RDD#randomSplit() (as example) ? Cheers On Tue, May 5, 2015 at 2:42 PM, Bill Q wrote: > Hi all, > I have a large RDD that I map a function to it. Based on the nature of > each record in the input RDD, I will generate two types of data. I would > like to save each type into it