Hi Iceberg Community, I wanted to bring up a discussion regarding the current TableOperations.commit logic and its impact on registerTablewith the overwrite option in my https://github.com/apache/iceberg/pull/12228#discussion_r1972591876. Currently, the commit logic always writes a new metadata.json for atomic swaps of table metadata. This design makes it difficult to directly set a user-provided metadata.json as the latest table metadata in the catalog when registering a table with overwrite.
The current workaround is to drop the existing table and re-register it with the provided metadata.json. However, this approach introduces a potential issue: lack of atomicity, which can lead to failures in intermediate states. For example, if concurrent writes or table drops occur between the deletion and re-registration, it may lead to inconsistent or unexpected results. To address this, we would love to hear the community’s thoughts on: Potential approaches to allow registerTable with overwrite to perform an atomic swap while respecting the user-provided metadata.json. Implications on changing table UUID, whether or not to allow table UUID change when user provided metadata.json have a different table UUI as the existing one. We appreciate any insights or suggestions you may have. Best, Steve Zhang > On Feb 10, 2025, at 4:47 PM, Steve Zhang <hongyue_zh...@apple.com.INVALID> > wrote: > > Thank you Russell and Ryan. > > Let me start to work on a new API to support force table registration in > catalog. > > Thanks, > Steve Zhang > > > >> On Feb 10, 2025, at 4:29 PM, rdb...@gmail.com wrote: >> >> Yeah, it sounds like a "register table force" is the right concept here. I >> think we want to make sure that table updates remain change-based as the >> best practice in the REST API. But there are some irregular use cases that >> justify having some mechanism to completely replace the state (like >> push-based mirroring). I think it makes sense to revisit mirroring and this >> use case and come up with a path forward. >> >> On Mon, Feb 10, 2025 at 3:12 PM Russell Spitzer <russell.spit...@gmail.com >> <mailto:russell.spit...@gmail.com>> wrote: >>> I still would like a "register table" force" option >>> >>> On Mon, Feb 10, 2025 at 5:06 PM Steve Zhang >>> <hongyue_zh...@apple.com.invalid> wrote: >>>> Thank you Dan for your detailed reply. Based on your explanation, do you >>>> think it would be worthwhile to support non-linear or complete metadata >>>> replacements in the REST implementation? I am happy to contribute but >>>> might need some guidance from the community on the best approach. >>>> >>>> For additional context, we explored into the workaround of using a >>>> combination of dropping table and re-registering the table with concerns >>>> of reading in between. There’s also an attempt to add a force option to >>>> the register-table API (https://github.com/apache/iceberg/pull/5327), >>>> which would allow for metadata swap on an existing table. However, it was >>>> suggested that use TableOperations.commit(base, new) is preferred to >>>> achieve atomicity. >>>> >>>> Thanks, >>>> Steve Zhang >>>> >>>> >>>> >>>>> On Feb 10, 2025, at 1:49 PM, Daniel Weeks <dwe...@apache.org >>>>> <mailto:dwe...@apache.org>> wrote: >>>>> >>>>> Hey Steve, >>>>> >>>>> I think the issue here is that you're using the commit api in table >>>>> operations to perform a non-incremental/linear change to the metadata. >>>>> The REST implementation is a little more strict in that it builds a set >>>>> of updates based on the mutations made to the metadata and the commit >>>>> process applies those changes. In this scenario, no changes have been >>>>> made and the call is attempting a complete replacement. >>>>> >>>>> The other implementations are just blindly swapping the location, so >>>>> while that operation does achieve the effect you're looking for, it's not >>>>> the right semantics for the commit. >>>>> >>>>> You might want to consider using the "register table" operation instead, >>>>> which takes the table identifier and location to perform this type of >>>>> swap. >>>>> >>>>> -Dan >>>>> >>>>> On Fri, Feb 7, 2025 at 10:17 AM Steve Zhang >>>>> <hongyue_zh...@apple.com.invalid> wrote: >>>>>> Hey Iceberg Experts: >>>>>> >>>>>> I am seeking assistance and insights regarding an issue we’ve >>>>>> encountered with RESTTableOperations and its inability to support >>>>>> on-demand table metadata swaps. We are currently adopting the REST-based >>>>>> catalog from Hive and have noticed a potential gap in the >>>>>> TableOperations.commit() API. Typically, we use the commit API to revert >>>>>> a table to a previously known state, as demonstrated below: >>>>>> >>>>>> String deisredMetadataPath = >>>>>> "/var/newdb/table/metadata/00003-579b23d1-4ca5-4acf-85ec-081e1699cb83.metadata.json"" >>>>>> ops.commit(ops.current(), TableMetadataParser.read(ops.io >>>>>> <http://ops.io/>(), dedeisredMetadataPath)); >>>>>> >>>>>> However, this approach is no longer working with the REST-based >>>>>> catalog. I suspect that the issue may be related to how the update type >>>>>> is modeled in RESTTableOperations. I have shared a unit test that >>>>>> reproduces the problem on >>>>>> https://github.com/apache/iceberg/issues/12134, where it works on JDBC >>>>>> and in-memory catalogs, but not with RESTCatalog. >>>>>> >>>>>> Best Regards, >>>>>> Steve Zhang >>>>>> >>>>>> >>>>>> >>>> >