Re: RDD of RDDs

2015-06-10 Thread ping yan
Thanks much for the detailed explanations. I suspected architectural support of the notion of rdd of rdds, but my understanding of Spark or distributed computing in general is not as deep as allowing me to understand better. so this really helps! I ended up going with List[RDD]. The collection of

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
>> >> On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar wrote: >> >>> Simillar question was asked before: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html >>> >>> Here is one of the reasons why I think RDD[RDD[T]] is not pos

Re: RDD of RDDs

2015-06-09 Thread Mark Hamstra
n or action APIs of > RDD), it will be possible to have RDD of RDD. > > On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar wrote: > >> Simillar question was asked before: >> http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html >> >> Here

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
; http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html > > Here is one of the reasons why I think RDD[RDD[T]] is not possible: > >- RDD is only a handle to the actual data partitions. It has a >reference/pointer to the *SparkContext* object (*sc*) and a li

Re: Rdd of Rdds

2015-06-09 Thread lonikar
rk job. Hope it helps. You need to consider List[RDD] or some other collection. Possibly in future, if and when spark architecture allows workers to launch spark jobs (the functions passed to transformation or action APIs of RDD), it will be possible to have RDD of RDD. -- View this messa

Re: RDD of RDDs

2015-06-09 Thread kiran lonikar
Simillar question was asked before: http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html Here is one of the reasons why I think RDD[RDD[T]] is not possible: - RDD is only a handle to the actual data partitions. It has a reference/pointer to the *SparkContext* object

RDD of RDDs

2015-06-08 Thread ping yan
ap or DataFrame operations on them. (I already had the function coded, I am therefore reluctant to work with the ResultIterable object coming out of rdd.groupByKey() ... ) I've searched the mailing list and googled on "RDD of RDDs" and seems like it isn't a thing at all. A few c

Re: How to merge a RDD of RDDs into one uber RDD

2015-01-07 Thread Raghavendra Pandey
nifesting itself as a >>> new >>> one. >>> >>> >>> Regards >>> -Ravi >>> >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-user-list. >>> 1001560.n3.nabbl

Re: How to merge a RDD of RDDs into one uber RDD

2015-01-07 Thread Sean Owen
PE but further down I am getting a indexOutOfBounds, so >> trying to figure out if the original problem is manifesting itself as a >> new >> one. >> >> >> Regards >> -Ravi >> >> >> >> >> -- >> View this message in context

Re: How to merge a RDD of RDDs into one uber RDD

2015-01-07 Thread Raghavendra Pandey
ger get the NPE but further down I am getting a indexOutOfBounds, so > trying to figure out if the original problem is manifesting itself as a new > one. > > > Regards > -Ravi > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n

Re: How to merge a RDD of RDDs into one uber RDD

2015-01-07 Thread rkgurram
rk-user-list.1001560.n3.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-uber-RDD-tp20986p21012.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apach

Re: How to merge a RDD of RDDs into one uber RDD

2015-01-06 Thread k.tham
e an array of RDDs from which you can fold over them and merge them. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-uber-RDD-tp20986p21007.html Sent from the Apache Spark User List mailing list archive at

Re: Rdd of Rdds

2014-10-22 Thread Michael Malak
On Wednesday, October 22, 2014 9:06 AM, Sean Owen wrote: > No, there's no such thing as an RDD of RDDs in Spark. > Here though, why not just operate on an RDD of Lists? or a List of RDDs? > Usually one of these two is the right approach whenever you feel > inclined to operate

Re: Rdd of Rdds

2014-10-22 Thread Sonal Goyal
Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Wed, Oct 22, 2014 at 8:35 PM, Sean Owen wrote: > No, there's no such thing as an RDD of RDDs in Spark. > Here though, why not just operate on an RDD of Lists? or a List of RDDs? > Usually one of t

Re: Rdd of Rdds

2014-10-22 Thread Sean Owen
No, there's no such thing as an RDD of RDDs in Spark. Here though, why not just operate on an RDD of Lists? or a List of RDDs? Usually one of these two is the right approach whenever you feel inclined to operate on an RDD of RDDs. On Wed, Oct 22, 2014 at 3:58 PM, Tomer Benyamini wrote: &g

Rdd of Rdds

2014-10-22 Thread Tomer Benyamini
Hello, I would like to parallelize my work on multiple RDDs I have. I wanted to know if spark can support a "foreach" on an RDD of RDDs. Here's a java example: public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setA