Thanks for looking into this Haizhou. I'll take a closer look at the PRs this/next week.
Eduard On Thu, Sep 19, 2024 at 2:22 AM Haizhou Zhao <> wrote: > Hello dev-list, > > *What* > I'm looking for issues and PRs reviews from the community to enable REST > Catalog based Integration Test for Query Engines. > > Issue: > PR: > > *Background* > Recently, thanks to @Daniel's effort of adding RCK (REST Compatibility > Kit) test utilities (ref:, > we now can spin up a simple REST Catalog within test environment. I saw our > existing Spark integration tests are based on Hive & Hadoop Catalog only > (ref: >, > and I think our Spark connector release procedure will benefit from running > the existing Spark integration test against REST Catalog (leveraging RCK > util), alongside Hadoop & Hive. > > *Why* > As the community gradually adopts REST Catalog, having Spark integration > tests running against REST Catalog will make sure we capture any issues > relevant to RESTCatalog clients early on, better serving REST Catalog > adopters in the community. Additionally, if we can build Spark integration > tests against REST Catalog, then this idea could extend to more query > engines like Flink later. > > *Current opened issues and PRs* > *PR:* > 1., the very first step here > is to add REST based integ tests to Spark 3.5 tests. We can extend the > tests to Spark 3.4 & 3.3 later if the community likes the idea. > > *Issues:* > When enabling Spark integ tests on REST Catalog alongside Hadoop/Hive > Catalog, there are some test cases where Hadoop/Hive can pass, but REST > cannot pass. They either indicate a behavior difference between the > catalogs (when handling the same Spark command), or a potential issue to be > looked into further. > > 1., REST Client will > incorrectly modify the "last-updated-ms" attribute of table metadata after > receiving responses from servers. This issue has been closed by community > effort (thx to @Eduard, @Ryan, @Daniel, @Steve for > discussing/fixing/reviewing) > 2., when Issuing a Spark > "CREATE OR REPLACE ${table}" command, Hive/Hadoop Catalog will not clear > the snapshot logs (prior to table replacement), while REST Catalog will. I > think we need some clarification on whether table replacement should clear > snapshot logs. > 3., REST Catalog at the > moment will fail Spark rename tests ("ALTER ${table} RENAME TO > ${table_rename}"). Spark call stacks (RenameTableExec) will pass catalog > name along with namespace name together in the "to" identifier to Iceberg > Spark connector call stacks. Meanwhile, HiveCatalog rename method will > always treat the first namespace layer of "to" identifier as catalog name > and strip it before actual renaming; while RESTCatalog does not have > similar pre-processing, thus HiveCatalog will pass the "ALTER TABLE RENAME" > test but not RESTCatalog. > > Let me know any feedback, and also welcome any reviews on PRs and > discussions on issues. > > Thanks, > -Haizhou > >