Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Hussain Towaileb Mon, 11 Aug 2025 08:40:12 -0700

Hello Hari

1. Catalog persistence:
yes, once a catalog is created, it is persisted, it is a metadata entity
that is created, just like creating a collection, it's permanent. Tables
will be created using on those catalogs, so unless they are stored, a
customer would need to re-create it each time, which is not practical. As
for the credentials part, it depends on what type of credentials they use,
if it is permanent ones, then it is fine. If they use temporary credentials
that expire, there are 2 types of those:
- Passing keys + session token, then yes, if they expire, they will need
updating, which we don't support, so they will have to re-create the
catalog with the new credentials.
- Passing trust account authentication, this mechanism has temporary
credentials but automatically refresh, you can see this APE for more
details:
https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+16%3A+Cross-Account+Trust+Authentication+for+AWS+S3+External+Collections

2. Table referencing:
This still has room for discussion, but the idea I had in mind that the
name space would be in the WITH clause. This is to avoid breaking/confusing
things as "tables" are actually "external collections", and if you say
a.b.c, then you are talking about a "collection" in a "database".
The current planned behavior (again, open for discussion)
Say you create your catalog:
"CREATE CATALOG myCatalog .... WITH {"namespace": "my.name.space", ...}"
This would make all collections (tables) created on this catalog default to
the "my.name.space" namespace by default.

So if I create a collection:
"CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
"users", ...}"

Then this table is residing at "catalog-warehouse-path/my/name/space/users"

If however I would like to create a collection in a namespace other than
the default, we can set that property, which will override the catalog's
default namespace:
"CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
"users", "namespace": "my.other.name.space"}"

And now this collection would be at
"catalog-warehouse-path/my/other/name/space/users"

3. Splitting APE into 2:
Currently, I find the two topics tightly coupled with each other, it would
be more convenient to have them together for context. But if it gets too
large or too confusing, I don't think there is harm in splitting them.
Also note that creating a table is actually the same syntax for creating an
external collection, it just takes extra property "table-type": "iceberg"
to differentiate it from a normal external collection.

On Mon, Aug 11, 2025 at 4:18 AM Hari Kishore Chaparala <hchap...@uci.edu>
wrote:

> Thanks for the improvement. The syntax looks much more intuitive than in
> Spark, where the catalog has to be configured with the "spark.sql" prefix
> (even for DataFrame operations), which can be confusing—especially when
> working with multiple catalogs.
>
> A few questions on the new CATALOG entity:
>
> 1. Catalog persistence — When we run "*CREATE CATALOG myRestCatalog*" with
> configuration options, will the catalog be stored and persisted beyond the
> current session? In Spark and other engines, the catalog implementation and
> configuration usually last only for the active session. Since we are
> querying external tables, I’m not sure if storing catalog details is
> necessary. Also, AWS roles and STS credentials expire after some time,
> which would require catalog updates.
>
> 2. Table referencing — How do we plan to reference tables? Will it be a
> three-part notation -- might be clearer when working across multiple
> catalogs?
> For example:
>
>
>
> *SELECT *FROM glue_catalog.namespace1.iceberg_table1 AINNER JOIN
> unity_catalog.namespace2.delta_lake_table1 B  ON A.id = B.id;*
>
> 3. It looks like this APE proposes two features: 1. The new CATALOG entity
> 2. DQL and DDL support for Iceberg tables using various catalog
> implementations. Would it make sense to split these into separate APEs?
>
> Thanks
> Hari Kishore
>
> On Fri, Aug 8, 2025 at 9:39 AM Hussain Towaileb <hussai...@gmail.com>
> wrote:
>
> > Initiating discussion for adding improved support for Apache Iceberg
> > Feature: *Improved Support for Apache Iceberg*
> > Details: Apache Iceberg would provide support for reading Iceberg tables.
> > This APE discusses adding improved support to the current Apache Iceberg
> > support by introducing the Catalog entity to AsterixDB Metadata, adding
> > support to different types of Iceberg catalogs, and introducing other
> > features like time travel.
> >
> > APE:
> >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*25*3A*Apache*Iceberg*Support__;KyUrKys!!CzAuKJ42GuquVTTmVmPViYEvSg!J81u58s9FyWRyedF8qV0TL-QjZrvS9vCviVHuCte1wGJ-y3qgzG087UwfC-ii0LKkEI3c5Iw7CG6yxA62A$
> >
> > --
> > Regards,
> > Hussain Towaileb
> >
>

-- 
Regards,
Hussain Towaileb

Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Reply via email to