DStream has a help method to print the first 10 elements of each RDD. You
could take some inspiration from it, as the usecase is practically the same
and the code will be probably very similar:  rdd.take(10)...

https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L591

-kr, Gerard.




On Thu, May 29, 2014 at 10:08 PM, Brian Gawalt <bgaw...@gmail.com> wrote:

> Try looking at the .mapPartitions( ) method implemented for RDD[T] objects.
> It will give you direct access to an iterator containing the member objects
> of each partition for doing the kind of within-partition hashtag counts
> you're describing.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517p6534.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to