That makes sense. Thanks for the clarification, Hussain. On Mon, Aug 11, 2025 at 8:40 AM Hussain Towaileb <hussai...@gmail.com> wrote:
> Hello Hari > > 1. Catalog persistence: > yes, once a catalog is created, it is persisted, it is a metadata entity > that is created, just like creating a collection, it's permanent. Tables > will be created using on those catalogs, so unless they are stored, a > customer would need to re-create it each time, which is not practical. As > for the credentials part, it depends on what type of credentials they use, > if it is permanent ones, then it is fine. If they use temporary credentials > that expire, there are 2 types of those: > - Passing keys + session token, then yes, if they expire, they will need > updating, which we don't support, so they will have to re-create the > catalog with the new credentials. > - Passing trust account authentication, this mechanism has temporary > credentials but automatically refresh, you can see this APE for more > details: > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*16*3A*Cross-Account*Trust*Authentication*for*AWS*S3*External*Collections__;KyUrKysrKysrKw!!CzAuKJ42GuquVTTmVmPViYEvSg!LNqPhf1prMmZ41ZjrV2H3GMj25fFDAmeqSSCuS5SDAm6vC1r9e1nG7oUpn6NT7sgPqBndN7pCZS4HnOy0A$ > > 2. Table referencing: > This still has room for discussion, but the idea I had in mind that the > name space would be in the WITH clause. This is to avoid breaking/confusing > things as "tables" are actually "external collections", and if you say > a.b.c, then you are talking about a "collection" in a "database". > The current planned behavior (again, open for discussion) > Say you create your catalog: > "CREATE CATALOG myCatalog .... WITH {"namespace": "my.name.space", ...}" > This would make all collections (tables) created on this catalog default to > the "my.name.space" namespace by default. > > So if I create a collection: > "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name": > "users", ...}" > > Then this table is residing at "catalog-warehouse-path/my/name/space/users" > > If however I would like to create a collection in a namespace other than > the default, we can set that property, which will override the catalog's > default namespace: > "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name": > "users", "namespace": "my.other.name.space"}" > > And now this collection would be at > "catalog-warehouse-path/my/other/name/space/users" > > 3. Splitting APE into 2: > Currently, I find the two topics tightly coupled with each other, it would > be more convenient to have them together for context. But if it gets too > large or too confusing, I don't think there is harm in splitting them. > Also note that creating a table is actually the same syntax for creating an > external collection, it just takes extra property "table-type": "iceberg" > to differentiate it from a normal external collection. > > > On Mon, Aug 11, 2025 at 4:18 AM Hari Kishore Chaparala <hchap...@uci.edu> > wrote: > > > Thanks for the improvement. The syntax looks much more intuitive than in > > Spark, where the catalog has to be configured with the "spark.sql" prefix > > (even for DataFrame operations), which can be confusing—especially when > > working with multiple catalogs. > > > > A few questions on the new CATALOG entity: > > > > 1. Catalog persistence — When we run "*CREATE CATALOG myRestCatalog*" > with > > configuration options, will the catalog be stored and persisted beyond > the > > current session? In Spark and other engines, the catalog implementation > and > > configuration usually last only for the active session. Since we are > > querying external tables, I’m not sure if storing catalog details is > > necessary. Also, AWS roles and STS credentials expire after some time, > > which would require catalog updates. > > > > 2. Table referencing — How do we plan to reference tables? Will it be a > > three-part notation -- might be clearer when working across multiple > > catalogs? > > For example: > > > > > > > > *SELECT *FROM glue_catalog.namespace1.iceberg_table1 AINNER JOIN > > unity_catalog.namespace2.delta_lake_table1 B ON A.id = B.id;* > > > > 3. It looks like this APE proposes two features: 1. The new CATALOG > entity > > 2. DQL and DDL support for Iceberg tables using various catalog > > implementations. Would it make sense to split these into separate APEs? > > > > Thanks > > Hari Kishore > > > > On Fri, Aug 8, 2025 at 9:39 AM Hussain Towaileb <hussai...@gmail.com> > > wrote: > > > > > Initiating discussion for adding improved support for Apache Iceberg > > > Feature: *Improved Support for Apache Iceberg* > > > Details: Apache Iceberg would provide support for reading Iceberg > tables. > > > This APE discusses adding improved support to the current Apache > Iceberg > > > support by introducing the Catalog entity to AsterixDB Metadata, adding > > > support to different types of Iceberg catalogs, and introducing other > > > features like time travel. > > > > > > APE: > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*25*3A*Apache*Iceberg*Support__;KyUrKys!!CzAuKJ42GuquVTTmVmPViYEvSg!J81u58s9FyWRyedF8qV0TL-QjZrvS9vCviVHuCte1wGJ-y3qgzG087UwfC-ii0LKkEI3c5Iw7CG6yxA62A$ > > > > > > -- > > > Regards, > > > Hussain Towaileb > > > > > > > > -- > Regards, > Hussain Towaileb >