Hello dev-list, *What* I'm looking for issues and PRs reviews from the community to enable REST Catalog based Integration Test for Query Engines.
Issue: https://github.com/apache/iceberg/issues/11079 PR: https://github.com/apache/iceberg/pull/11093 *Background* Recently, thanks to @Daniel's effort of adding RCK (REST Compatibility Kit) test utilities (ref: https://github.com/apache/iceberg/pull/10908), we now can spin up a simple REST Catalog within test environment. I saw our existing Spark integration tests are based on Hive & Hadoop Catalog only (ref: https://github.com/apache/iceberg/blob/2025e79/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/CatalogTestBase.java), and I think our Spark connector release procedure will benefit from running the existing Spark integration test against REST Catalog (leveraging RCK util), alongside Hadoop & Hive. *Why* As the community gradually adopts REST Catalog, having Spark integration tests running against REST Catalog will make sure we capture any issues relevant to RESTCatalog clients early on, better serving REST Catalog adopters in the community. Additionally, if we can build Spark integration tests against REST Catalog, then this idea could extend to more query engines like Flink later. *Current opened issues and PRs* *PR:* 1. https://github.com/apache/iceberg/pull/11093, the very first step here is to add REST based integ tests to Spark 3.5 tests. We can extend the tests to Spark 3.4 & 3.3 later if the community likes the idea. *Issues:* When enabling Spark integ tests on REST Catalog alongside Hadoop/Hive Catalog, there are some test cases where Hadoop/Hive can pass, but REST cannot pass. They either indicate a behavior difference between the catalogs (when handling the same Spark command), or a potential issue to be looked into further. 1. https://github.com/apache/iceberg/issues/11103, REST Client will incorrectly modify the "last-updated-ms" attribute of table metadata after receiving responses from servers. This issue has been closed by community effort (thx to @Eduard, @Ryan, @Daniel, @Steve for discussing/fixing/reviewing) 2. https://github.com/apache/iceberg/issues/11109, when Issuing a Spark "CREATE OR REPLACE ${table}" command, Hive/Hadoop Catalog will not clear the snapshot logs (prior to table replacement), while REST Catalog will. I think we need some clarification on whether table replacement should clear snapshot logs. 3. https://github.com/apache/iceberg/issues/11154, REST Catalog at the moment will fail Spark rename tests ("ALTER ${table} RENAME TO ${table_rename}"). Spark call stacks (RenameTableExec) will pass catalog name along with namespace name together in the "to" identifier to Iceberg Spark connector call stacks. Meanwhile, HiveCatalog rename method will always treat the first namespace layer of "to" identifier as catalog name and strip it before actual renaming; while RESTCatalog does not have similar pre-processing, thus HiveCatalog will pass the "ALTER TABLE RENAME" test but not RESTCatalog. Let me know any feedback, and also welcome any reviews on PRs and discussions on issues. Thanks, -Haizhou