attilapiros opened a new pull request, #50946: URL: https://github.com/apache/spark/pull/50946
What changes were proposed in this pull request? This PR aborts the indeterminate partially completed result stage instead of resubmitting it. Why are the changes needed? A result stage compared to shuffle map stage has more output and more intermediate state: It can use a FileOutputCommitter where each task does a Hadoop task commit. In case of a re-submit this will lead to re-commit that Hadoop task (possibly with different content). In case of JDBC write it can already inserted all rows of a partitions into the target schema. Ignoring the resubmit when a recalculation is needed would cause data corruption as the partial result is based on the previous indeterminate computation but continuing means finishing the stage with the new recomputed data. As long as rollback of a result stage is not supported (https://issues.apache.org/jira/browse/SPARK-25342) the best we can do when a recalculation is needed is aborting the stage. The existing code before this PR already tried to address a similar situation at the handling of FetchFailed when the fetch is coming from an indeterminate shuffle map stage: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2178-L2182 But this is not enough as a FetchFailed from a determinate stage can lead to an executor loss and a re-compute of the indeterminate parent of the result stage as shown in the attached unittest. Moreover the ResubmitFailedStages can be in race with a successful CompletionEvent. This is why this PR detects the partial execution at the re-submit of the indeterminate result stage. Does this PR introduce any user-facing change? No. How was this patch tested? New unit tests are created to illustrate the situation above. Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org