Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

via GitHub Fri, 14 Mar 2025 19:11:01 -0700


ahshahid commented on code in PR #50033:
URL: https://github.com/apache/spark/pull/50033#discussion_r1996507549



##########
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala:
##########
@@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage(
 
   /** Returns the sequence of partition ids that are missing (i.e. needs to be 
computed). */
   override def findMissingPartitions(): Seq[Int] = {
-    mapOutputTrackerMaster
-      .findMissingPartitions(shuffleDep.shuffleId)
-      .getOrElse(0 until numPartitions)
+    if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) {

Review Comment:
   I have now kept the new fields added only in ResultStage. As that is the 
only stage which cannot be rolled back once a successful task gets processed.
   For ShuffleMapStage, the existing snippet of code in submitMissingTasks will 
take care of the race.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

Reply via email to