RE: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Ranju Jain
Hi Attila, What is your use case here? Client Driver Application not using collect but internally calling python script which is reading part files records [comma separated string] of each cluster separately and copying records in other final csv file, so merging all part files data in single

Re: Spark saveAsTextFile Disk Recommendation

2021-03-20 Thread Attila Zsolt Piros
Hi! I would like to reflect only to the first part of your mail: I have a large RDD dataset of around 60-70 GB which I cannot send to driver > using *collect* so first writing that to disk using *saveAsTextFile* and > then this data gets saved in the form of multiple part files on each node > of