subject:"Re\: taking an n number of rows from and RDD starting from an index"

Re: taking an n number of rows from and RDD starting from an index

2015-09-02 Thread Niranda Perera

Hi all, thank you for your response. after taking a look at the implementations of rdd.collect(), I thought of using the rdd.runJob(...) method . for (int i = 0; i < dataFrame.rdd().partitions().length; i++) { dataFrame.sqlContext().sparkContext().runJob(data.rdd(), some function

Re: taking an n number of rows from and RDD starting from an index

2015-09-02 Thread Juan Rodríguez Hortalá

Hi, Maybe you could use zipWithIndex and filter to skip the first elements. For example starting from scala> sc.parallelize(100 to 120, 4).zipWithIndex.collect res12: Array[(Int, Long)] = Array((100,0), (101,1), (102,2), (103,3), (104,4), (105,5), (106,6), (107,7), (108,8), (109,9), (110,10), (11

Re: taking an n number of rows from and RDD starting from an index

2015-09-01 Thread Hemant Bhanawat

I think rdd.toLocalIterator is what you want. But it will keep one partition's data in-memory. On Wed, Sep 2, 2015 at 10:05 AM, Niranda Perera wrote: > Hi all, > > I have a large set of data which would not fit into the memory. So, I wan > to take n number of data from the RDD given a particular

Re: taking an n number of rows from and RDD starting from an index

Re: taking an n number of rows from and RDD starting from an index

Re: taking an n number of rows from and RDD starting from an index

3 matches

Site Navigation

Mail list logo

Footer information