Is there a way to get iterator from RDD? Something like rdd.collect(), but
returning lazy sequence and not single array.

Context: I need to GZip processed data to upload it to Amazon S3. Since
archive should be a single file, I want to iterate over RDD, writing each
line to a local .gz file. File is small enough to fit local disk, but still
large enough not to fit into memory.

Reply via email to