ahshahid commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2785142050
What about the retry of partitions?. Because the failing stage is indeterminate, all the partitions should be retired.. isn't it? On Mon, Apr 7, 2025, 8:11 PM Attila Zsolt Piros ***@***.***> wrote: > Yes, they are correct. > > That stage has two tasks one is running on hostC and the other is on hostD > : > > https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3160 > > The fetch failure was from the host called hostC: > > https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3166 > > This caused a executor lost on hostC: > > 25/04/07 19:48:08.769 pool-1-thread-1-ScalaTest-running-DAGSchedulerSuite INFO DAGSchedulerSuite$MyDAGScheduler: Executor lost: hostC-exec (epoch 4) > > So this removes the output which was made on hostC. This is how we get > the assert right but latter when the ResubmitFailedStages is handled the > execution goes to submitMissingTasks() where all the output is removed: > > https://github.com/apache/spark/blob/2b3fb526c8bd8b486f280756d5282cc84f7473d7/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1555 > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/50033#issuecomment-2785113081>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC6XG2D43ASFL2HXHDCFIMD2YM46BAVCNFSM6AAAAABXSCDWSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGEYTGMBYGE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > [image: attilapiros]*attilapiros* left a comment (apache/spark#50033) > <https://github.com/apache/spark/pull/50033#issuecomment-2785113081> > > Yes, they are correct. > > That stage has two tasks one is running on hostC and the other is on hostD > : > > https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3160 > > The fetch failure was from the host called hostC: > > https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala#L3166 > > This caused a executor lost on hostC: > > 25/04/07 19:48:08.769 pool-1-thread-1-ScalaTest-running-DAGSchedulerSuite INFO DAGSchedulerSuite$MyDAGScheduler: Executor lost: hostC-exec (epoch 4) > > So this removes the output which was made on hostC. This is how we get > the assert right but latter when the ResubmitFailedStages is handled the > execution goes to submitMissingTasks() where all the output is removed: > > https://github.com/apache/spark/blob/2b3fb526c8bd8b486f280756d5282cc84f7473d7/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1555 > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/50033#issuecomment-2785113081>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC6XG2D43ASFL2HXHDCFIMD2YM46BAVCNFSM6AAAAABXSCDWSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVGEYTGMBYGE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org