Excellent, thank you!
On Sat, Aug 2, 2014 at 4:46 AM, Aaron Davidson wrote:
> Ah, that's unfortunate, that definitely should be added. Using a
> pyspark-internal method, you could try something like
>
> javaIterator = rdd._jrdd.toLocalIterator()
> it = rdd._collect_iterator_through_file(javaIte
Ah, that's unfortunate, that definitely should be added. Using a
pyspark-internal method, you could try something like
javaIterator = rdd._jrdd.toLocalIterator()
it = rdd._collect_iterator_through_file(javaIterator)
On Fri, Aug 1, 2014 at 3:04 PM, Andrei wrote:
> Thanks, Aaron, it should be fi
Thanks, Aaron, it should be fine with partitions (I can repartition it
anyway, right?).
But rdd.toLocalIterator is purely Java/Scala method. Is there Python
interface to it?
I can get Java iterator though rdd._jrdd, but it isn't converted to Python
iterator automatically. E.g.:
>>> rdd = sc.para
rdd.toLocalIterator will do almost what you want, but requires that each
individual partition fits in memory (rather than each individual line).
Hopefully that's sufficient, though.
On Fri, Aug 1, 2014 at 1:38 AM, Andrei wrote:
> Is there a way to get iterator from RDD? Something like rdd.colle