Hi everyone, We (Andy, Fuat, Michael and Sanjay) are from the Amazon S3 team. We recently released the Analytics Accelerator Library for Amazon S3 under the Apache 2.0 license (https://github.com/awslabs/analytics-accelerator-s3) that optimises access to S3 for Parquet data accessed by workloads driven through Hadoop S3A, Iceberg S3FileIO etc.
I am reaching out to you to see if anyone is interested in trying this in your environments and providing feedback, especially around features, performance, and compatibility. Our testing results show promising improvements for TPC-DS-like workloads. You can get started with the version on Maven Central following the guide here: https://github.com/awslabs/analytics-accelerator-s3?tab=readme-ov-file#using-with-spark-with-iceberg. Please let us know if we have any questions or would like to learn more about the Analytics Accelerator Library. Best, Fuat. Github Repo: https://github.com/awslabs/analytics-accelerator-s3 Getting Started with AAL on Iceberg: https://github.com/awslabs/analytics-accelerator-s3?tab=readme-ov-file#using-with-spark-with-iceberg S3A PR: https://github.com/apache/hadoop/pull/7433 S3FileIO PR: https://github.com/apache/iceberg/pull/12299