Awesome! it looks promising. Thanks Rishabh and Marcelo. On Wed, Feb 3, 2016 at 12:09 PM, Rishabh Wadhawan <rishabh...@gmail.com> wrote:
> Check out this link > http://spark.apache.org/docs/latest/configuration.html and check > spark.shuffle.service. Thanks > > On Feb 3, 2016, at 1:02 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > > Yes, but you don't necessarily need to use dynamic allocation (just enable > the external shuffle service). > > On Wed, Feb 3, 2016 at 11:53 AM, Nirav Patel <npa...@xactlycorp.com> > wrote: > >> Do you mean this setup? >> >> https://spark.apache.org/docs/1.5.2/job-scheduling.html#dynamic-resource-allocation >> >> >> >> On Wed, Feb 3, 2016 at 11:50 AM, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >>> Without the exact error from the driver that caused the job to restart, >>> it's hard to tell. But a simple way to improve things is to install the >>> Spark shuffle service on the YARN nodes, so that even if an executor >>> crashes, its shuffle output is still available to other executors. >>> >>> On Wed, Feb 3, 2016 at 11:46 AM, Nirav Patel <npa...@xactlycorp.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have a spark job running on yarn-client mode. At some point during >>>> Join stage, executor(container) runs out of memory and yarn kills it. Due >>>> to this Entire job restarts! and it keeps doing it on every failure? >>>> >>>> What is the best way to checkpoint? I see there's checkpoint api and >>>> other option might be to persist before Join stage. Would that prevent >>>> retry of entire job? How about just retrying only the task that was >>>> distributed to that faulty executor? >>>> >>>> Thanks >>>> >>>> >>>> >>>> [image: What's New with Xactly] >>>> <http://www.xactlycorp.com/email-click/> >>>> >>>> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] >>>> <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] >>>> <https://twitter.com/Xactly> [image: Facebook] >>>> <https://www.facebook.com/XactlyCorp> [image: YouTube] >>>> <http://www.youtube.com/xactlycorporation> >>> >>> >>> >>> >>> -- >>> Marcelo >>> >> >> >> >> >> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> >> >> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] >> <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] >> <https://twitter.com/Xactly> [image: Facebook] >> <https://www.facebook.com/XactlyCorp> [image: YouTube] >> <http://www.youtube.com/xactlycorporation> >> > > > > -- > Marcelo > > > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>