Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

via GitHub Fri, 14 Mar 2025 00:12:40 -0700


ahshahid commented on code in PR #50033:
URL: https://github.com/apache/spark/pull/50033#discussion_r1994979719



##########
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala:
##########
@@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage(
 
   /** Returns the sequence of partition ids that are missing (i.e. needs to be 
computed). */
   override def findMissingPartitions(): Seq[Int] = {
-    mapOutputTrackerMaster
-      .findMissingPartitions(shuffleDep.shuffleId)
-      .getOrElse(0 until numPartitions)
+    if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) {

Review Comment:
   I am now clearing the map output as you suggested, which ensures that code 
of findMissingPartitions remains unchanged.
   I will write some additional tests which check the behaviour  of 
indeterminate shuffle stage failure. That will clarify if the race condition 
can be avoided without using areAllPartitionsMissing and 
attemptIdAllPartitionsMissing.
   
   For result stage , I believe its evident that its needed.. IMO.
   For shuffle stage , the new tests which will be added, will clarify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

Reply via email to