RDD.toLocalIterator return the partition one by one but with all elements in
the partition, which is not lazy calculated. Given the design of spark, it is
very hard to maintain the state of iterator across runJob.
def toLocalIterator: Iterator[T] = {
def collectPartition(p: Int): Array[T]
RDD.toLocalIterator() is the suitable solution.
But I doubt whether it conform with the design principle of spark and RDD.
All RDD transform is lazily computed until it end with some actions.
2014-10-29 15:28 GMT+08:00 Sean Owen :
> Call RDD.toLocalIterator()?
>
> https://spark.apache.org/docs/la
Call RDD.toLocalIterator()?
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html
On Wed, Oct 29, 2014 at 4:15 AM, Dai, Kevin wrote:
> Hi, ALL
>
>
>
> I have a RDD[T], can I use it like a iterator.
>
> That means I can compute every element of this RDD lazily.
>
>
>
> Best
vin
Cc: user@spark.apache.org
Subject: Re: Use RDD like a Iterator
I think it is already lazily computed, or do you mean something else? Following
is the signature of compute in RDD
def compute(split: Partition, context: TaskContext): Iterator[T]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 8:15 P
I think it is already lazily computed, or do you mean something else? Following
is the signature of compute in RDD
def compute(split: Partition, context: TaskContext): Iterator[T]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 8:15 PM, Dai, Kevin wrote:
> Hi, ALL
>
> I have a RDD[T], can I use it
Hi, ALL
I have a RDD[T], can I use it like a iterator.
That means I can compute every element of this RDD lazily.
Best Regards,
Kevin.