mridulm commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2808115624
@attilapiros the way to handle this in spark is to leverage output committer/commit protocol. > And my other concern is the Hadoop's file output committer I cannot see any guarantees what will happen if a Hadoop task is re-commited with different data (because of indeterminism we might have different data). You cannot 're-commit' with different data : this is why result stage is failed in case of indeterminism -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org