Re: RDD data flow

2014-12-17 Thread Madhu
Patrick Wendell wrote > The Partition itself doesn't need to be an iterator - the iterator > comes from the result of compute(partition). The Partition is just an > identifier for that partition, not the data itself. OK, that makes sense. The docs for Partition are a bit vague on this point. Maybe

Re: RDD data flow

2014-12-16 Thread Patrick Wendell
> Why is that? Shouldn't all Partitions be Iterators? Clearly I'm missing > something. The Partition itself doesn't need to be an iterator - the iterator comes from the result of compute(partition). The Partition is just an identifier for that partition, not the data itself. Take a look at the sig