The general feature of allowing clients to declare that they take responsibility for overwriting each other's change is fine from my POV. As previous messages point out there are use cases for it.
However, the term "deconfliction" does not sound perfect to me in this case. From my POV deconfliction means some deliberate action on the metadata taken by the Polaris Server in order to resolve / reconcile conflicting changes submitted by clients. Do you think we could treat these use cases as different features for the sake of clarity? I can see some users being potentially confused by this (as I was). For example, to implement preemptive conflict resolution (one party wins) we probably need to define means for clients / engines to declare relative priorities and elect to participate in this feature (the proposed headers). However, append/append resolution may not necessarily fall under the same election flag. That is, clients allowing (future) append/append resolution may not want to allow preemptive resolution. WDYT? I'm +1 to evolving Polartis to be able to properly support real conflict resolution (starting with append/append conflicts). Cheers, Dmitri. On Tue, Jul 8, 2025 at 12:58 PM Eric Maynard <eric.w.mayn...@gmail.com> wrote: > Hi all, > > What Russel wrote in (2) is correct -- this will essentially be a > best-effort deconfliction, but my hope is that we can expand the supported > cases over time. It may be helpful to make a distinction between commits > which are "eligible for deconfliction" from commits that "the server knows > how to successfully deconflict". The proposal pertains to the former, but > I'm hoping the draft implementation I publish soon will help clarify the > latter. My plan is to focus on append-only commits first. > > Dmitri, as with all commits the server will write the final metadata.json > content. As for how it does so, it will vary a bit based on the nature of > the conflicting commits, but the general idea is that the commits have > consented to deconfliction by the server. So in Robert's example both > snapshots should be preserved, but the commits have given the server > license to decide which of those should be the *latest* snapshot after > deconfliction. > > --EM > > On Tue, Jul 8, 2025 at 9:13 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > > > For me there are two big use cases for this: > > > > 1. Simple overwrite. > > I may have several jobs, for example one that does TTL , the command it > > runs is idempotent and always deletes all rows / files before a certain > > point. The other is a merge/update command. In this situation I don't > even > > need to reconcile, I always want the merge to succeed because it is more > > expensive while my delete is very cheap and not that important. (A > > subsequent TTL delete run will remove the records so as long as I don't > > have constant conflicts everything is fine.) > > > > For example I may have a very complicated and long GDPR job that > modifies a > > lot of data and then a TTL job that removes rows before a certain point > in > > time (as well as appends and other things going on). The TTL Job can > always > > be sacrificed for the GDPR job. > > > > 2. Actual Deconfliction > > In this case I just want to do a cheap very of the deconfliction that > would > > eventually happen on the client side, i.e. without any filters. In case > of > > conflict I open up my previous manifest list and the one in the new > > TableUpdate and create a new Update that contains a combined manifest > list > > with the added entries from the conflicting manifest list in my current. > > This is more complicated to perform but we can do it relatively easily > for > > jobs that just modify the manifest list itself. For more complicated > > situations (a manifest which is being removed has already been removed) > we > > would have to actually dig into manifests or just fail. I would probably > > start by only tackling the case in which the manifest level modifications > > can be isolated and allowing the user to pass along the tag basically > > stating that the changes cannot conflict. > > > > Being able to start doing this kind of work on the Catalog side is going > to > > be important once we ever get Fine Grained Commits into the Iceberg Rest > > spec so experimenting with the ideas now I think is important. > > > > On Tue, Jul 8, 2025 at 9:21 AM Dmitri Bourlatchkov <di...@apache.org> > > wrote: > > > > > Hi Eric, > > > > > > Just to clarify: Are you proposing Polaris Server to reconcile metadata > > > changes from two conflicting commit requests or are you proposing the > > > clients to do that? > > > > > > In other words, who will make the final metadata JSON? Clients (and the > > > server merely commits it) or the Polaris Server itself (modifying what > > the > > > clients submitted)? > > > > > > Sorry, I'm a bit lost on that point based on the current doc text. > > > > > > Thanks, > > > Dmitri. > > > > > > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard <eric.w.mayn...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > Wanted to share this short design doc > > > > < > > > > > > > > > > https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms > > > > > > > > > for > > > > a simple method of allowing conflicting commits to both be committed. > > If > > > > implemented, this would allow e.g. two writers doing append-only > > > operations > > > > to a table in Polaris to always succeed. > > > > > > > > If you're interested, please take a look. In the meantime, I'll be > > > > preparing a small draft PR to serve as a reference implementation. > > > > > > > > --EM > > > > > > > > > >