Hi everyone,

We (Andy, Fuat, Michael and Sanjay) are from the Amazon S3 team. We recently 
released the Analytics Accelerator Library for Amazon S3 under the Apache 2.0 
license (https://github.com/awslabs/analytics-accelerator-s3) that optimises 
access to S3 for Parquet data accessed by workloads driven through Hadoop S3A, 
Iceberg S3FileIO etc.

I am reaching out to you to see if anyone is interested in trying this in your 
environments and providing feedback, especially around features, performance, 
and compatibility. Our testing results show promising improvements for 
TPC-DS-like workloads. You can get started with the version on Maven Central 
following the guide here: 
https://github.com/awslabs/analytics-accelerator-s3?tab=readme-ov-file#using-with-spark-with-iceberg.

Please let us know if we have any questions or would like to learn more about 
the Analytics Accelerator Library.

Best,
Fuat.

Github Repo: https://github.com/awslabs/analytics-accelerator-s3
Getting Started with AAL on Iceberg: 
https://github.com/awslabs/analytics-accelerator-s3?tab=readme-ov-file#using-with-spark-with-iceberg
 S3A PR: https://github.com/apache/hadoop/pull/7433
S3FileIO PR: https://github.com/apache/iceberg/pull/12299


Reply via email to