Yeah, it sounds like a "register table force" is the right concept here. I think we want to make sure that table updates remain change-based as the best practice in the REST API. But there are some irregular use cases that justify having some mechanism to completely replace the state (like push-based mirroring). I think it makes sense to revisit mirroring and this use case and come up with a path forward.
On Mon, Feb 10, 2025 at 3:12 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I still would like a "register table" force" option > > On Mon, Feb 10, 2025 at 5:06 PM Steve Zhang > <hongyue_zh...@apple.com.invalid> wrote: > >> Thank you Dan for your detailed reply. Based on your explanation, do you >> think it would be worthwhile to support non-linear or complete metadata >> replacements in the REST implementation? I am happy to contribute but might >> need some guidance from the community on the best approach. >> >> For additional context, we explored into the workaround of using a >> combination of dropping table and re-registering the table with concerns of >> reading in between. There’s also an attempt to add a force option to the >> register-table API (https://github.com/apache/iceberg/pull/5327), which >> would allow for metadata swap on an existing table. However, it was >> suggested that use TableOperations.commit(base, new) is preferred to >> achieve atomicity. >> >> Thanks, >> Steve Zhang >> >> >> >> On Feb 10, 2025, at 1:49 PM, Daniel Weeks <dwe...@apache.org> wrote: >> >> Hey Steve, >> >> I think the issue here is that you're using the commit api in table >> operations to perform a non-incremental/linear change to the metadata. The >> REST implementation is a little more strict in that it builds a set of >> updates based on the mutations made to the metadata and the commit process >> applies those changes. In this scenario, no changes have been made and the >> call is attempting a complete replacement. >> >> The other implementations are just blindly swapping the location, so >> while that operation does achieve the effect you're looking for, it's not >> the right semantics for the commit. >> >> You might want to consider using the "register table" operation instead, >> which takes the table identifier and location to perform this type of swap. >> >> -Dan >> >> On Fri, Feb 7, 2025 at 10:17 AM Steve Zhang >> <hongyue_zh...@apple.com.invalid> wrote: >> >>> Hey Iceberg Experts: >>> >>> I am seeking assistance and insights regarding an issue we’ve >>> encountered with RESTTableOperations and its inability to support on-demand >>> table metadata swaps. We are currently adopting the REST-based catalog from >>> Hive and have noticed a potential gap in the TableOperations.commit() >>> API. Typically, we use the commit API to revert a table to a previously >>> known state, as demonstrated below: >>> >>> String deisredMetadataPath = >>> "/var/newdb/table/metadata/00003-579b23d1-4ca5-4acf-85ec-081e1699cb83.metadata.json"" >>> ops.commit(ops.current(), TableMetadataParser.read(ops.io(), >>> dedeisredMetadataPath)); >>> >>> However, this approach is no longer working with the REST-based >>> catalog. I suspect that the issue may be related to how the update type is >>> modeled in RESTTableOperations. I have shared a unit test that reproduces >>> the problem on https://github.com/apache/iceberg/issues/12134, where it >>> works on JDBC and in-memory catalogs, but not with RESTCatalog. >>> >>> Best Regards, >>> Steve Zhang >>> >>> >>> >>> >>