[PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

via GitHub Thu, 17 Apr 2025 16:44:06 -0700


attilapiros opened a new pull request, #50630:
URL: https://github.com/apache/spark/pull/50630


   ### What changes were proposed in this pull request?
   
   This PR aborts the indeterminate partially completed result stage instead of 
resubmitting it.
   
   ### Why are the changes needed?
   
   A result stage compared to shuffle map stage has more output and more state:
   - It can use a `FileOutputCommitter` where each task does a Hadoop task 
commit. In case of a re-submit this will lead to re-commit that Hadoop task 
(possibly with different content)
   - In case of JDBC write it can already inserted all rows of a partitions 
into the target schema.   
   
   As long as rollback of a result stage is not supported 
(https://issues.apache.org/jira/browse/SPARK-25342) the best we can is abort 
the stage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests are created for this situation. 
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

Reply via email to