subject:"Re\: Rdd of Rdds"

Re: RDD of RDDs

2015-06-10 Thread ping yan

Thanks much for the detailed explanations. I suspected architectural support of the notion of rdd of rdds, but my understanding of Spark or distributed computing in general is not as deep as allowing me to understand better. so this really helps! I ended up going with List[RDD]. The collection of

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar

Yes true. That's why I said if and when. But hopefully I have given correct explanation of why rdd of rdd is not possible. On 09-Jun-2015 10:22 pm, "Mark Hamstra" wrote: > That would constitute a major change in Spark's architecture. It's not > happening anytime soon. > > On Tue, Jun 9, 2015 at

Re: RDD of RDDs

2015-06-09 Thread Mark Hamstra

That would constitute a major change in Spark's architecture. It's not happening anytime soon. On Tue, Jun 9, 2015 at 1:34 AM, kiran lonikar wrote: > Possibly in future, if and when spark architecture allows workers to > launch spark jobs (the functions passed to transformation or action APIs o

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar

Possibly in future, if and when spark architecture allows workers to launch spark jobs (the functions passed to transformation or action APIs of RDD), it will be possible to have RDD of RDD. On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar wrote: > Simillar question was asked before: > http://apach

Re: Rdd of Rdds

2015-06-09 Thread lonikar

Replicating my answer to another question asked today: Here is one of the reasons why I think RDD[RDD[T]] is not possible: * RDD is only a handle to the actual data partitions. It has a reference/pointer to the /SparkContext /object (/sc/) and a list of partitions. * The SparkContext is an o

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar

Simillar question was asked before: http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html Here is one of the reasons why I think RDD[RDD[T]] is not possible: - RDD is only a handle to the actual data partitions. It has a reference/pointer to the *SparkContext* object

Re: Rdd of Rdds

2014-10-22 Thread Michael Malak

On Wednesday, October 22, 2014 9:06 AM, Sean Owen wrote: > No, there's no such thing as an RDD of RDDs in Spark. > Here though, why not just operate on an RDD of Lists? or a List of RDDs? > Usually one of these two is the right approach whenever you feel > inclined to operate on an RDD of RDDs.

Re: Rdd of Rdds

2014-10-22 Thread Sonal Goyal

Another approach could be to create artificial keys for each RDD and convert to PairRDDs. So your first RDD becomes JavaPairRDD rdd1 with values 1,"1" ; 1,"2" and so on Second RDD becomes rdd2 is 2, "a"; 2, "b";2,"c" You can union the two RDDs, groupByKey, countByKey etc and maybe achieve what yo

Re: Rdd of Rdds

2014-10-22 Thread Sean Owen

No, there's no such thing as an RDD of RDDs in Spark. Here though, why not just operate on an RDD of Lists? or a List of RDDs? Usually one of these two is the right approach whenever you feel inclined to operate on an RDD of RDDs. On Wed, Oct 22, 2014 at 3:58 PM, Tomer Benyamini wrote: > Hello, >

Re: RDD of RDDs

Re: RDD of RDDs

Re: RDD of RDDs

Re: RDD of RDDs

Re: Rdd of Rdds

Re: RDD of RDDs

Re: Rdd of Rdds

Re: Rdd of Rdds

Re: Rdd of Rdds

9 matches

Site Navigation

Mail list logo

Footer information