Which version of spark are you using? Can you look at the event timeline
and the DAG of the job and see where its spending more time? .save simply
triggers your entire pipeline, If you are doing a join/groupBy kind of
operations then you need to make sure the keys are evenly distributed
throughout the partitions.

Thanks
Best Regards

On Sat, Dec 5, 2015 at 8:24 AM, Ram VISWANADHA <
ram.viswana...@dailymotion.com> wrote:

> That didn’t work :(
> Any help I have documented some steps here.
>
> http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-almost-never-finishes
>
> Best Regards,
> Ram
>
> From: Sahil Sareen <sareen...@gmail.com>
> Date: Wednesday, December 2, 2015 at 10:18 PM
> To: Ram VISWANADHA <ram.viswana...@dailymotion.com>
> Cc: Ted Yu <yuzhih...@gmail.com>, user <user@spark.apache.org>
> Subject: Re: Improve saveAsTextFile performance
>
>
> http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per
>

Reply via email to