> Why is that? Shouldn't all Partitions be Iterators? Clearly I'm missing > something.
The Partition itself doesn't need to be an iterator - the iterator comes from the result of compute(partition). The Partition is just an identifier for that partition, not the data itself. Take a look at the signature for compute() in the RDD class. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L97 > > On a related subject, I was thinking of documenting the data flow of RDDs in > more detail. The code is not hard to follow, but it's nice to have a simple > picture with the major components and some explanation of the flow. The > declaration of Partition is throwing me off. > > Thanks! > > > > ----- > -- > Madhu > https://www.linkedin.com/in/msiddalingaiah > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org