attilapiros commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2785113081
Yes, they are correct. That stage has two tasks one is running on `hostC` and the other is on `hostD`: https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3160 The fetch failure was from the host called `hostC`: https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3166 This caused a executor lost on `hostC`: ``` 25/04/07 19:48:08.769 pool-1-thread-1-ScalaTest-running-DAGSchedulerSuite INFO DAGSchedulerSuite$MyDAGScheduler: Executor lost: hostC-exec (epoch 4) ``` So this removes the output which was made on `hostC`. This is how we get the assert right but latter when the `ResubmitFailedStages` is handled the execution goes to `submitMissingTasks()` where all the output is removed: https://github.com/apache/spark/blob/2b3fb526c8bd8b486f280756d5282cc84f7473d7/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org