Re: Selecting first ten values in a RDD/partition

Gerard Maas Thu, 29 May 2014 13:26:40 -0700

DStream has a help method to print the first 10 elements of each RDD. You
could take some inspiration from it, as the usecase is practically the same
and the code will be probably very similar:  rdd.take(10)...


https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L591

-kr, Gerard.




On Thu, May 29, 2014 at 10:08 PM, Brian Gawalt <bgaw...@gmail.com> wrote:

> Try looking at the .mapPartitions( ) method implemented for RDD[T] objects.
> It will give you direct access to an iterator containing the member objects
> of each partition for doing the kind of within-partition hashtag counts
> you're describing.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517p6534.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Selecting first ten values in a RDD/partition

Reply via email to