RE: How to collect/take arbitrary number of records in the driver?

Mohammed Guller Tue, 09 Feb 2016 14:57:07 -0800

You can do something like this:



val indexedRDD = rdd.zipWithIndex

val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) && 
(index < 199)}

val result = filteredRDD.take(100)



Warning: the ordering of the elements in the RDD is not guaranteed.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>



-----Original Message-----
From: SRK [mailto:swethakasire...@gmail.com]
Sent: Tuesday, February 9, 2016 1:58 PM
To: user@spark.apache.org
Subject: How to collect/take arbitrary number of records in the driver?



Hi ,



How to get a fixed amount of records from an RDD in Driver? Suppose I want the 
records from 100 to 1000 and then save them to some external database, I know 
that I can do it from Workers in partition but I want to avoid that for some 
reasons. The idea is to collect the data to driver and save, although slowly.



I am looking for something like take(100, 1000)  or take (1000,2000)



Thanks,

Swetha







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-collect-take-arbitrary-number-of-records-in-the-driver-tp26184.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



---------------------------------------------------------------------

To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For 
additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

RE: How to collect/take arbitrary number of records in the driver?

Reply via email to