Hi Iceberg Community,

I wanted to bring up a discussion regarding the current TableOperations.commit 
logic and its impact on registerTablewith the overwrite option in my 
https://github.com/apache/iceberg/pull/12228#discussion_r1972591876. Currently, 
the commit logic always writes a new metadata.json for atomic swaps of table 
metadata. This design makes it difficult to directly set a user-provided 
metadata.json as the latest table metadata in the catalog when registering a 
table with overwrite.

The current workaround is to drop the existing table and re-register it with 
the provided metadata.json. However, this approach introduces a potential 
issue: lack of atomicity, which can lead to failures in intermediate states. 
For example, if concurrent writes or table drops occur between the deletion and 
re-registration, it may lead to inconsistent or unexpected results.
To address this, we would love to hear the community’s thoughts on:
Potential approaches to allow registerTable with overwrite to perform an atomic 
swap while respecting the user-provided metadata.json. 
Implications on changing table UUID, whether or not to allow table UUID change 
when user provided metadata.json have a different table UUI as the existing one.
We appreciate any insights or suggestions you may have. 

Best,
Steve Zhang

> On Feb 10, 2025, at 4:47 PM, Steve Zhang <hongyue_zh...@apple.com.INVALID> 
> wrote:
> 
> Thank you Russell and Ryan. 
> 
>   Let me start to work on a new API to support force table registration in 
> catalog.
> 
> Thanks,
> Steve Zhang
> 
> 
> 
>> On Feb 10, 2025, at 4:29 PM, rdb...@gmail.com wrote:
>> 
>> Yeah, it sounds like a "register table force" is the right concept here. I 
>> think we want to make sure that table updates remain change-based as the 
>> best practice in the REST API. But there are some irregular use cases that 
>> justify having some mechanism to completely replace the state (like 
>> push-based mirroring). I think it makes sense to revisit mirroring and this 
>> use case and come up with a path forward.
>> 
>> On Mon, Feb 10, 2025 at 3:12 PM Russell Spitzer <russell.spit...@gmail.com 
>> <mailto:russell.spit...@gmail.com>> wrote:
>>> I still would like a "register table" force" option
>>> 
>>> On Mon, Feb 10, 2025 at 5:06 PM Steve Zhang 
>>> <hongyue_zh...@apple.com.invalid> wrote:
>>>> Thank you Dan for your detailed reply. Based on your explanation, do you 
>>>> think it would be worthwhile to support non-linear or complete metadata 
>>>> replacements in the REST implementation? I am happy to contribute but 
>>>> might need some guidance from the community on the best approach.
>>>> 
>>>> For additional context, we explored into the workaround of using a 
>>>> combination of dropping table and re-registering the table with concerns 
>>>> of reading in between. There’s also an attempt to add a force option to 
>>>> the register-table API (https://github.com/apache/iceberg/pull/5327), 
>>>> which would allow for metadata swap on an existing table. However, it was 
>>>> suggested that use TableOperations.commit(base, new) is preferred to 
>>>> achieve atomicity.
>>>> 
>>>> Thanks,
>>>> Steve Zhang
>>>> 
>>>> 
>>>> 
>>>>> On Feb 10, 2025, at 1:49 PM, Daniel Weeks <dwe...@apache.org 
>>>>> <mailto:dwe...@apache.org>> wrote:
>>>>> 
>>>>> Hey Steve,
>>>>> 
>>>>> I think the issue here is that you're using the commit api in table 
>>>>> operations to perform a non-incremental/linear change to the metadata.  
>>>>> The REST implementation is a little more strict in that it builds a set 
>>>>> of updates based on the mutations made to the metadata and the commit 
>>>>> process applies those changes.  In this scenario, no changes have been 
>>>>> made and the call is attempting a complete replacement.
>>>>> 
>>>>> The other implementations are just blindly swapping the location, so 
>>>>> while that operation does achieve the effect you're looking for, it's not 
>>>>> the right semantics for the commit.
>>>>> 
>>>>> You might want to consider using the "register table" operation instead, 
>>>>> which takes the table identifier and location to perform this type of 
>>>>> swap.
>>>>> 
>>>>> -Dan
>>>>> 
>>>>> On Fri, Feb 7, 2025 at 10:17 AM Steve Zhang 
>>>>> <hongyue_zh...@apple.com.invalid> wrote:
>>>>>> Hey Iceberg Experts:
>>>>>> 
>>>>>>   I am seeking assistance and insights regarding an issue we’ve 
>>>>>> encountered with RESTTableOperations and its inability to support 
>>>>>> on-demand table metadata swaps. We are currently adopting the REST-based 
>>>>>> catalog from Hive and have noticed a potential gap in the 
>>>>>> TableOperations.commit() API. Typically, we use the commit API to revert 
>>>>>> a table to a previously known state, as demonstrated below:
>>>>>> 
>>>>>> String deisredMetadataPath = 
>>>>>> "/var/newdb/table/metadata/00003-579b23d1-4ca5-4acf-85ec-081e1699cb83.metadata.json""
>>>>>> ops.commit(ops.current(), TableMetadataParser.read(ops.io 
>>>>>> <http://ops.io/>(), dedeisredMetadataPath));
>>>>>> 
>>>>>>   However, this approach is no longer working with the REST-based 
>>>>>> catalog. I suspect that the issue may be related to how the update type 
>>>>>> is modeled in RESTTableOperations.  I have shared a unit test that 
>>>>>> reproduces the problem on 
>>>>>> https://github.com/apache/iceberg/issues/12134, where it works on JDBC 
>>>>>> and in-memory catalogs, but not with RESTCatalog. 
>>>>>> 
>>>>>> Best Regards,
>>>>>> Steve Zhang
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
> 

Reply via email to