Re: [PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

via GitHub Sat, 03 May 2025 01:12:31 -0700


attilapiros commented on code in PR #50630:
URL: https://github.com/apache/spark/pull/50630#discussion_r2072347515



##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2337,10 +2344,8 @@ private[spark] class DAGScheduler(
         "checkpointing the RDD before repartition and try again."
     }
 
-    activeJobs.foreach(job => collectStagesToRollback(job.finalStage :: Nil))
-
     // The stages will be rolled back after checking
-    val rollingBackStages = HashSet[Stage](mapStage)
+    val rollingBackStages = HashSet[Stage]()

Review Comment:
   This was the case even before this PR, is not it?
   
   But this makes me wonder why we materialize the missing partitions 
containing the exact missing partitions itself? 
   
   We already have a `numAvailableOutputs`  which is based on a counter in 
`MapOutputTracker`: 
https://github.com/apache/spark/blob/db59634edaf8e1dc587077edd5a2bc7955b3f357/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L143-L148
   
   When `numAvailableOutputs` is nonzero we need to roll it back. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

Reply via email to