[PR] [SPARK-47618] Use Magic Committer for all S3 buckets by default [spark]

via GitHub Sat, 24 May 2025 12:57:26 -0700


dongjoon-hyun opened a new pull request, #51010:
URL: https://github.com/apache/spark/pull/51010

### What changes were proposed in this pull request?

This PR aims to use Apache Hadoop `Magic Committer` for all S3 buckets by
default in Apache Spark 4.0.0.

### Why are the changes needed?

Apache Hadoop `Magic Committer` has been used for S3 buckets to get the best
performance since [S3 became fully consistent on December 1st,
2020](https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/).
-
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel
> Amazon S3 provides strong read-after-write consistency for PUT and DELETE
requests of objects in your Amazon S3 bucket in all AWS Regions. This behavior
applies to both writes to new objects as well as PUT requests that overwrite
existing objects and DELETE requests. In addition, read operations on Amazon S3
Select, Amazon S3 access controls lists (ACLs), Amazon S3 Object Tags, and
object metadata (for example, the HEAD object) are strongly consistent.

### Does this PR introduce _any_ user-facing change?

Yes, the migration guide is updated.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-47618] Use Magic Committer for all S3 buckets by default [spark]

Reply via email to