Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Hari Kishore Chaparala Mon, 11 Aug 2025 11:06:21 -0700

That makes sense. Thanks for the clarification, Hussain.

On Mon, Aug 11, 2025 at 8:40 AM Hussain Towaileb <hussai...@gmail.com>
wrote:


> Hello Hari
>
> 1. Catalog persistence:
> yes, once a catalog is created, it is persisted, it is a metadata entity
> that is created, just like creating a collection, it's permanent. Tables
> will be created using on those catalogs, so unless they are stored, a
> customer would need to re-create it each time, which is not practical. As
> for the credentials part, it depends on what type of credentials they use,
> if it is permanent ones, then it is fine. If they use temporary credentials
> that expire, there are 2 types of those:
> - Passing keys + session token, then yes, if they expire, they will need
> updating, which we don't support, so they will have to re-create the
> catalog with the new credentials.
> - Passing trust account authentication, this mechanism has temporary
> credentials but automatically refresh, you can see this APE for more
> details:
>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*16*3A*Cross-Account*Trust*Authentication*for*AWS*S3*External*Collections__;KyUrKysrKysrKw!!CzAuKJ42GuquVTTmVmPViYEvSg!LNqPhf1prMmZ41ZjrV2H3GMj25fFDAmeqSSCuS5SDAm6vC1r9e1nG7oUpn6NT7sgPqBndN7pCZS4HnOy0A$
>
> 2. Table referencing:
> This still has room for discussion, but the idea I had in mind that the
> name space would be in the WITH clause. This is to avoid breaking/confusing
> things as "tables" are actually "external collections", and if you say
> a.b.c, then you are talking about a "collection" in a "database".
> The current planned behavior (again, open for discussion)
> Say you create your catalog:
> "CREATE CATALOG myCatalog .... WITH {"namespace": "my.name.space", ...}"
> This would make all collections (tables) created on this catalog default to
> the "my.name.space" namespace by default.
>
> So if I create a collection:
> "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
> "users", ...}"
>
> Then this table is residing at "catalog-warehouse-path/my/name/space/users"
>
> If however I would like to create a collection in a namespace other than
> the default, we can set that property, which will override the catalog's
> default namespace:
> "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
> "users", "namespace": "my.other.name.space"}"
>
> And now this collection would be at
> "catalog-warehouse-path/my/other/name/space/users"
>
> 3. Splitting APE into 2:
> Currently, I find the two topics tightly coupled with each other, it would
> be more convenient to have them together for context. But if it gets too
> large or too confusing, I don't think there is harm in splitting them.
> Also note that creating a table is actually the same syntax for creating an
> external collection, it just takes extra property "table-type": "iceberg"
> to differentiate it from a normal external collection.
>
>
> On Mon, Aug 11, 2025 at 4:18 AM Hari Kishore Chaparala <hchap...@uci.edu>
> wrote:
>
> > Thanks for the improvement. The syntax looks much more intuitive than in
> > Spark, where the catalog has to be configured with the "spark.sql" prefix
> > (even for DataFrame operations), which can be confusing—especially when
> > working with multiple catalogs.
> >
> > A few questions on the new CATALOG entity:
> >
> > 1. Catalog persistence — When we run "*CREATE CATALOG myRestCatalog*"
> with
> > configuration options, will the catalog be stored and persisted beyond
> the
> > current session? In Spark and other engines, the catalog implementation
> and
> > configuration usually last only for the active session. Since we are
> > querying external tables, I’m not sure if storing catalog details is
> > necessary. Also, AWS roles and STS credentials expire after some time,
> > which would require catalog updates.
> >
> > 2. Table referencing — How do we plan to reference tables? Will it be a
> > three-part notation -- might be clearer when working across multiple
> > catalogs?
> > For example:
> >
> >
> >
> > *SELECT *FROM glue_catalog.namespace1.iceberg_table1 AINNER JOIN
> > unity_catalog.namespace2.delta_lake_table1 B  ON A.id = B.id;*
> >
> > 3. It looks like this APE proposes two features: 1. The new CATALOG
> entity
> > 2. DQL and DDL support for Iceberg tables using various catalog
> > implementations. Would it make sense to split these into separate APEs?
> >
> > Thanks
> > Hari Kishore
> >
> > On Fri, Aug 8, 2025 at 9:39 AM Hussain Towaileb <hussai...@gmail.com>
> > wrote:
> >
> > > Initiating discussion for adding improved support for Apache Iceberg
> > > Feature: *Improved Support for Apache Iceberg*
> > > Details: Apache Iceberg would provide support for reading Iceberg
> tables.
> > > This APE discusses adding improved support to the current Apache
> Iceberg
> > > support by introducing the Catalog entity to AsterixDB Metadata, adding
> > > support to different types of Iceberg catalogs, and introducing other
> > > features like time travel.
> > >
> > > APE:
> > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*25*3A*Apache*Iceberg*Support__;KyUrKys!!CzAuKJ42GuquVTTmVmPViYEvSg!J81u58s9FyWRyedF8qV0TL-QjZrvS9vCviVHuCte1wGJ-y3qgzG087UwfC-ii0LKkEI3c5Iw7CG6yxA62A$
> > >
> > > --
> > > Regards,
> > > Hussain Towaileb
> > >
> >
>
>
> --
> Regards,
> Hussain Towaileb
>

Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Reply via email to