Hi Attila,
What is your use case here?
Client Driver Application not using collect but internally calling python
script which is reading part files records [comma separated string] of each
cluster separately and copying records in other final csv file, so merging all
part files data in single
Hi!
I would like to reflect only to the first part of your mail:
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
> using *collect* so first writing that to disk using *saveAsTextFile* and
> then this data gets saved in the form of multiple part files on each node
> of