Hi all,

What Russel wrote in (2) is correct -- this will essentially be a
best-effort deconfliction, but my hope is that we can expand the supported
cases over time. It may be helpful to make a distinction between commits
which are "eligible for deconfliction" from commits that "the server knows
how to successfully deconflict". The proposal pertains to the former, but
I'm hoping the draft implementation I publish soon will help clarify the
latter. My plan is to focus on append-only commits first.

Dmitri, as with all commits the server will write the final metadata.json
content. As for how it does so, it will vary a bit based on the nature of
the conflicting commits, but the general idea is that the commits have
consented to deconfliction by the server. So in Robert's example both
snapshots should be preserved, but the commits have given the server
license to decide which of those should be the *latest* snapshot after
deconfliction.

--EM

On Tue, Jul 8, 2025 at 9:13 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> For me there are two big use cases for this:
>
> 1. Simple overwrite.
> I may have several jobs, for example one that does TTL , the command it
> runs is idempotent and always deletes all rows / files before a certain
> point. The other is a merge/update command. In this situation I don't even
> need to reconcile, I always want the merge to succeed because it is more
> expensive while my delete is very cheap and not that important. (A
> subsequent TTL delete run will remove the records so as long as I don't
> have constant conflicts everything is fine.)
>
> For example I may have a very complicated and long GDPR job that modifies a
> lot of data and then a TTL job that removes rows before a certain point in
> time (as well as appends and other things going on). The TTL Job can always
> be sacrificed for the GDPR job.
>
> 2. Actual Deconfliction
> In this case I just want to do a cheap very of the deconfliction that would
> eventually happen on the client side, i.e. without any filters. In case of
> conflict I open up my previous manifest list and the one in the new
> TableUpdate and create a new Update that contains a combined manifest list
> with the added entries from the conflicting manifest list in my current.
> This is more complicated to perform but we can do it relatively easily for
> jobs that just modify the manifest list itself. For more complicated
> situations (a manifest which is being removed has already been removed) we
> would have to actually dig into manifests or just fail. I would probably
> start by only tackling the case in which the manifest level modifications
> can be isolated and allowing the user to pass along the tag basically
> stating that the changes cannot conflict.
>
> Being able to start doing this kind of work on the Catalog side is going to
> be important once we ever get Fine Grained Commits into the Iceberg Rest
> spec so experimenting with the ideas now I think is important.
>
> On Tue, Jul 8, 2025 at 9:21 AM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > Hi Eric,
> >
> > Just to clarify: Are you proposing Polaris Server to reconcile metadata
> > changes from two conflicting commit requests or are you proposing the
> > clients to do that?
> >
> > In other words, who will make the final metadata JSON? Clients (and the
> > server merely commits it) or the Polaris Server itself (modifying what
> the
> > clients submitted)?
> >
> > Sorry, I'm a bit lost on that point based on the current doc text.
> >
> > Thanks,
> > Dmitri.
> >
> > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > Wanted to share this short design doc
> > > <
> > >
> >
> https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms
> > > >
> > > for
> > > a simple method of allowing conflicting commits to both be committed.
> If
> > > implemented, this would allow e.g. two writers doing append-only
> > operations
> > > to a table in Polaris to always succeed.
> > >
> > > If you're interested, please take a look. In the meantime, I'll be
> > > preparing a small draft PR to serve as a reference implementation.
> > >
> > > --EM
> > >
> >
>

Reply via email to