Re: restart from last successful stage

2015-07-29 Thread Tathagata Das
If you are changing the SparkConf, that mean you have recreate the SparkContext, isnt it? So you have to stop the previous SparkCotnext which deletes all the information about stages that have been run. So the better approach is to indeed save the data of the last stage explicitly and then try reru

Re: restart from last successful stage

2015-07-29 Thread Alex Nastetsky
I meant a restart by the user, as ayan said. I was thinking of a case where e.g. a Spark conf setting wrong and the job failed in Stage 1, in my example .. and we want to rerun the job with the right conf without rerunning Stage 0. Having this "re-start" capability may cause some chaos if it would

Re: restart from last successful stage

2015-07-28 Thread Tathagata Das
Okay, may I am confused on the word "would be useful to *restart* from the output of stage 0" ... did the OP mean restart by the user or restart automatically by the system? On Tue, Jul 28, 2015 at 3:43 PM, ayan guha wrote: > Hi > > I do not think op asks about attempt failure but stage failure

Re: restart from last successful stage

2015-07-28 Thread ayan guha
Hi I do not think op asks about attempt failure but stage failure and finally leading to job failure. In that case, rdd info from last run is gone even if from cache, isn't it? Ayan On 29 Jul 2015 07:01, "Tathagata Das" wrote: > If you are using the same RDDs in the both the attempts to run the

Re: restart from last successful stage

2015-07-28 Thread Tathagata Das
If you are using the same RDDs in the both the attempts to run the job, the previous stage outputs generated in the previous job will indeed be reused. This applies to core though. For dataframes, depending on what you do, the physical plan may get generated again leading to new RDDs which may caus

restart from last successful stage

2015-07-28 Thread Alex Nastetsky
Is it possible to restart the job from the last successful stage instead of from the beginning? For example, if your job has stages 0, 1 and 2 .. and stage 0 takes a long time and is successful, but the job fails on stage 1, it would be useful to be able to restart from the output of stage 0 inste