attilapiros commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2784704888
>And for the second case, where the fisrt result task is successful, but subsequent task fails, then the code will follow the existing path of aborting the query. @ahshahid Check your test "SPARK-51272: retry all the partitions of result stage, if the first result task has failed and failing ShuffleMap stage is inDeterminate". The title is a bit misleading as fetching from the determinate stage fails (`shuffleId1`): ``` makeCompletionEvent( taskSets.find(_.stageId == resultStage.id).get.tasks(0), FetchFailed(makeBlockManagerId("hostA"), shuffleId1, 0L, 0, 0, "ignored"), ``` The indeterminate stage is resubmitted as the determinate stage failure lead to losing the executors and all the shuffle blocks. There is no abort called because of the condition: https://github.com/apache/spark/blob/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org