ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994488494
########## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ########## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */ override def findMissingPartitions(): Seq[Int] = { - mapOutputTrackerMaster - .findMissingPartitions(shuffleDep.shuffleId) - .getOrElse(0 until numPartitions) + if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) { Review Comment: For Map , I think what you are saying sounds great. I dont know much about the Barrier RDD logic. But unregistering the map outputs makes sense.. For ResultStage, not sure what you mean by abort stage .. That will throw an Exception , right? But that is not what we should do , if the FetchFailures is happening for the first partition of the ResultStagel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org