ahshahid commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2784622753
@attilapiros : I dont quite understand, how the situation you have described can arise in this PR. In the handleTaskCompletion, the first check is: ``` val isIndeterministicZombie = event.reason match { case Success if stageOption.isDefined => val stage = stageOption.get (task.stageAttemptId < stage.latestInfo.attemptNumber() && stage.isIndeterminate) || stage.shouldDiscardResult(task.stageAttemptId) case _ => false } ``` So lets assume the case, 1) the first partition result is a failure. 2) Before sending the asynchronous ResubmitFailure message, it sets the flag in the stage via the call def markAllPartitionsMissing() 3) Now before resubmit (& hence increase of the attempt number in the stage), a successful result task for the same result stage gets processed, the code at the start of handleTaskCompletion, will mark it as isIndeterministicZombie. And so that output will get discarded..without committing anything to file. 4) When the resubmit failure increases the attempt id, the flag to discardResult Task is reset to false. And for the second case, where the fisrt result task is successful, but subsequent task fails, then the code will follow the existing path of aborting the query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org