[DISCUSS] Read support for Apache Iceberg tables

Hari Kishore Chaparala Thu, 30 Mar 2023 03:53:41 -0700

Initiating the discussion thread proposing a new external dataset feature
in AsterixDB.


*Feature:* External dataset support for reading Apache Iceberg tables
*Details:* Apache Iceberg is a table format for huge analytic tables. It
allows time travel queries, partitioning, and fast query planning from its
efficient tree-like metadata format, among several other features (
https://iceberg.apache.org/docs/latest/). As part of Iceberg-AsterixDB
integration, we first plan to support reading from Iceberg format version-1
tables with AsterixDB as the query engine utilizing our inherent read
parallelization. The Iceberg table details will be specified in the
external dataset DDL, and all queries will fetch the data from the latest
iceberg table snapshot. At present, AWS S3 and HDFS adapters can be used to
read Iceberg tables with data files in Parquet format.

*Changeset*: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17419
*APE*:
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+1%3A+Iceberg+API+Integration

[DISCUSS] Read support for Apache Iceberg tables

Reply via email to