attilapiros commented on code in PR #50630:
URL: https://github.com/apache/spark/pull/50630#discussion_r2072347515


##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2337,10 +2344,8 @@ private[spark] class DAGScheduler(
         "checkpointing the RDD before repartition and try again."
     }
 
-    activeJobs.foreach(job => collectStagesToRollback(job.finalStage :: Nil))
-
     // The stages will be rolled back after checking
-    val rollingBackStages = HashSet[Stage](mapStage)
+    val rollingBackStages = HashSet[Stage]()

Review Comment:
   This was the case even before this PR, is not it?
   
   But this makes me wonder why we materialize the missing partitions 
containing the exact missing partitions itself? 
   
   We already have a `numAvailableOutputs`  which is based on a counter in 
`MapOutputTracker`: 
https://github.com/apache/spark/blob/db59634edaf8e1dc587077edd5a2bc7955b3f357/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L143-L148
   
   When `numAvailableOutputs` is nonzero we need to roll it back. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to