ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2042486967
########## core/src/main/scala/org/apache/spark/scheduler/ResultStage.scala: ########## @@ -64,5 +75,16 @@ private[spark] class ResultStage( (0 until job.numPartitions).filter(id => !job.finished(id)) } + def markAllPartitionsMissing(): Unit = { + this.discardResultsForAttemptId = this.latestInfo.attemptNumber() + val job = activeJob.get + for (id <- 0 until job.numPartitions) { + job.finished(id) = false + } + } + + override def shouldDiscardResult(attemptId: Int): Boolean = + this.discardResultsForAttemptId >= attemptId Review Comment: Unless we decide to aggressively abort the query, even if its the first task which is failing ( & resultStage is inDeterminate), this piece of code and the two above, are needed , IMO to avoid the race condition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org