For me there are two big use cases for this:

1. Simple overwrite.
I may have several jobs, for example one that does TTL , the command it
runs is idempotent and always deletes all rows / files before a certain
point. The other is a merge/update command. In this situation I don't even
need to reconcile, I always want the merge to succeed because it is more
expensive while my delete is very cheap and not that important. (A
subsequent TTL delete run will remove the records so as long as I don't
have constant conflicts everything is fine.)

For example I may have a very complicated and long GDPR job that modifies a
lot of data and then a TTL job that removes rows before a certain point in
time (as well as appends and other things going on). The TTL Job can always
be sacrificed for the GDPR job.

2. Actual Deconfliction
In this case I just want to do a cheap very of the deconfliction that would
eventually happen on the client side, i.e. without any filters. In case of
conflict I open up my previous manifest list and the one in the new
TableUpdate and create a new Update that contains a combined manifest list
with the added entries from the conflicting manifest list in my current.
This is more complicated to perform but we can do it relatively easily for
jobs that just modify the manifest list itself. For more complicated
situations (a manifest which is being removed has already been removed) we
would have to actually dig into manifests or just fail. I would probably
start by only tackling the case in which the manifest level modifications
can be isolated and allowing the user to pass along the tag basically
stating that the changes cannot conflict.

Being able to start doing this kind of work on the Catalog side is going to
be important once we ever get Fine Grained Commits into the Iceberg Rest
spec so experimenting with the ideas now I think is important.

On Tue, Jul 8, 2025 at 9:21 AM Dmitri Bourlatchkov <di...@apache.org> wrote:

> Hi Eric,
>
> Just to clarify: Are you proposing Polaris Server to reconcile metadata
> changes from two conflicting commit requests or are you proposing the
> clients to do that?
>
> In other words, who will make the final metadata JSON? Clients (and the
> server merely commits it) or the Polaris Server itself (modifying what the
> clients submitted)?
>
> Sorry, I'm a bit lost on that point based on the current doc text.
>
> Thanks,
> Dmitri.
>
> On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Wanted to share this short design doc
> > <
> >
> https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms
> > >
> > for
> > a simple method of allowing conflicting commits to both be committed. If
> > implemented, this would allow e.g. two writers doing append-only
> operations
> > to a table in Polaris to always succeed.
> >
> > If you're interested, please take a look. In the meantime, I'll be
> > preparing a small draft PR to serve as a reference implementation.
> >
> > --EM
> >
>

Reply via email to