PTAL: http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per
-Sahil On Thu, Dec 3, 2015 at 9:18 AM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > Yes. That did not help. > > Best Regards, > Ram > From: Ted Yu <yuzhih...@gmail.com> > Date: Wednesday, December 2, 2015 at 3:25 PM > To: Ram VISWANADHA <ram.viswana...@dailymotion.com> > Cc: user <user@spark.apache.org> > Subject: Re: Improve saveAsTextFile performance > > Have you tried calling coalesce() before saveAsTextFile ? > > Cheers > > On Wed, Dec 2, 2015 at 3:15 PM, Ram VISWANADHA < > ram.viswana...@dailymotion.com> wrote: > >> JavaRDD.saveAsTextFile is taking a long time to succeed. There are 10 >> tasks, the first 9 complete in a reasonable time but the last task is >> taking a long time to complete. The last task contains the maximum number >> of records like 90% of the total number of records. Is there any way to >> parallelize the execution by increasing the number of tasks or evenly >> distributing the number of records to different tasks? >> >> Thanks in advance. >> >> Best Regards, >> Ram >> > >