Which version of spark are you using? Can you look at the event timeline and the DAG of the job and see where its spending more time? .save simply triggers your entire pipeline, If you are doing a join/groupBy kind of operations then you need to make sure the keys are evenly distributed throughout the partitions.
Thanks Best Regards On Sat, Dec 5, 2015 at 8:24 AM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > That didn’t work :( > Any help I have documented some steps here. > > http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-almost-never-finishes > > Best Regards, > Ram > > From: Sahil Sareen <sareen...@gmail.com> > Date: Wednesday, December 2, 2015 at 10:18 PM > To: Ram VISWANADHA <ram.viswana...@dailymotion.com> > Cc: Ted Yu <yuzhih...@gmail.com>, user <user@spark.apache.org> > Subject: Re: Improve saveAsTextFile performance > > > http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per >