Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

via GitHub Tue, 15 Apr 2025 20:13:10 -0700


mridulm commented on PR #50033:
URL: https://github.com/apache/spark/pull/50033#issuecomment-2808115624


   @attilapiros the way to handle this in spark is to leverage output 
committer/commit protocol.
   
   > And my other concern is the Hadoop's file output committer I cannot see 
any guarantees what will happen if a Hadoop task is re-commited with different 
data (because of indeterminism we might have different data).
   
   You cannot 're-commit' with different data : this is why result stage is 
failed in case of indeterminism 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

Reply via email to