Hi,

While reviewing DAGScheduler, and where failedStages internal
collection of failed staged ready for resubmission is used, I came
across a question for which I'm looking an answer to. Any hints would
be greatly appreciated.

When resubmitFailedStages [1] is executed, and there are any failed
stages, they are resubmitted using submitStage [2], but before it
happens, failedStages is cleared [3] so when submitStage is called
that will ultimately call submitMissingTasks for the stage, it checks
whether the stage is in failedStages (among the other sets for waiting
and running stages) [4].

My naive understanding is that the call to submitStage is a no-op in
this case, i.e. nothing really happens and the if expression will
silently finish without doing anything useful until some other event
happens that changes the status of the failed stages into waiting
ones.

Is my understanding incorrect? Where? Could the call to submitStage be
superfluous? Please guide in the right direction. Thanks.

[1] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L734
[2] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L743
[3] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L741
[4] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L919

Pozdrawiam,
Jacek

Jacek Laskowski | https://medium.com/@jaceklaskowski/
Mastering Apache Spark
==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to