In rdd.mapPartition(...) if I try to iterate through the items in the 
partition, everything screw. For example

val rdd = sc.parallelize(1 to 1000, 3)
val count = rdd.mapPartitions(iter => {
  println(iter.length)
  iter
}).count()


The count is 0. This is incorrect. The count should be 1000. If I just comment 
out the line println(iter.length), then the count become 1000 correctly.

Does this mean I cannot iterate through iter in mapPartitions? I want to get 
all items in a partition and compose one request to send to external system. 
How can I achieve that if I am not allowed to iterate through items in the 
partition?

Ningjun

Reply via email to