Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

via GitHub Mon, 14 Apr 2025 09:21:36 -0700


ahshahid commented on code in PR #50033:
URL: https://github.com/apache/spark/pull/50033#discussion_r2042486967



##########
core/src/main/scala/org/apache/spark/scheduler/ResultStage.scala:
##########
@@ -64,5 +75,16 @@ private[spark] class ResultStage(
     (0 until job.numPartitions).filter(id => !job.finished(id))
   }
 
+  def markAllPartitionsMissing(): Unit = {
+    this.discardResultsForAttemptId = this.latestInfo.attemptNumber()
+    val job = activeJob.get
+    for (id <- 0 until job.numPartitions) {
+      job.finished(id) = false
+    }
+  }
+
+  override def shouldDiscardResult(attemptId: Int): Boolean =
+    this.discardResultsForAttemptId >= attemptId

Review Comment:
   Unless we decide to aggressively abort the query, even if its  the first 
task which is failing ( & resultStage is inDeterminate),   this piece of code 
and the two above, are needed , IMO to avoid the race condition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

Reply via email to