ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1996488616
########## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ########## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */ override def findMissingPartitions(): Seq[Int] = { - mapOutputTrackerMaster - .findMissingPartitions(shuffleDep.shuffleId) - .getOrElse(0 until numPartitions) + if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) { Review Comment: @attilapiros For ShuffleMapStage specifically , you are right, that attemptIdAllPartitionsMissing is not needed, because of the following check in function submitMissingTasks ``` stage match { case sms: ShuffleMapStage if stage.isIndeterminate && !sms.isAvailable => mapOutputTracker.unregisterAllMapAndMergeOutput(sms.shuffleDep.shuffleId) sms.shuffleDep.newShuffleMergeState() case _ => } ``` One may be inclined to think that for ResultStage too , a similar call to just set all partitions as missing , in the above code should suffice. which should make the newly added fields completely redundant. But then the question will be how does one ensure that result completion task of the current failed attempt is not able to proceed with commiting data.. Which means that the newly added field will be needed atleast for ResultStage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org