Re: [PROPOSAL] Commit Deconfliction

Ryan Blue Wed, 09 Jul 2025 14:07:50 -0700

I think there's a better way to achieve the results that this proposal is
aiming for. If I understand correctly, this is trying to create a way for
Polaris to be able to apply changes from one commit on top of another
without sending a commit conflict back to the client. Parallel appends are
the primary example of this being safe, but not something that the service
can handle.


I would recommend taking a look at the work done by AWS on this that
resulted in the fine-grained commits proposal. The idea there was to send
the changes that the client is making to the content-metadata tree to the
service. That would also pass finer-grained validations/assertions that
enable the service to know when changes can be safely applied, even when
another writer has committed first. I think that's a better strategy and
would love to see it in the Iceberg REST protocol.

Taking that approach, this would allow clients to have a way to tell the
service _what_ conflicts, not just tell the service what won't and let the
service blindly apply changes. That reduces risk (no need to apply changes
blindly) while also making this more valuable because it doesn't require
user involvement across jobs and would be useful to all REST service
implementations.

Ryan

On Tue, Jul 8, 2025 at 5:10 PM Dmitri Bourlatchkov <di...@apache.org> wrote:

> The general feature of allowing clients to declare that they take
> responsibility for overwriting each other's change is fine from my POV. As
> previous messages point out there are use cases for it.
>
> However, the term "deconfliction" does not sound perfect to me in this
> case. From my POV deconfliction means some deliberate action on the
> metadata taken by the Polaris Server in order to resolve / reconcile
> conflicting changes submitted by clients.
>
> Do you think we could treat these use cases as different features for the
> sake of clarity? I can see some users being potentially confused by this
> (as I was).
>
> For example, to implement preemptive conflict resolution (one party wins)
> we probably need to define means for clients / engines to declare relative
> priorities and elect to participate in this feature (the proposed headers).
> However, append/append resolution may not necessarily fall under the same
> election flag. That is, clients allowing (future) append/append resolution
> may not want to allow preemptive resolution. WDYT?
>
> I'm +1 to evolving Polartis to be able to properly support real conflict
> resolution (starting with append/append conflicts).
>
> Cheers,
> Dmitri.
>
> On Tue, Jul 8, 2025 at 12:58 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > What Russel wrote in (2) is correct -- this will essentially be a
> > best-effort deconfliction, but my hope is that we can expand the
> supported
> > cases over time. It may be helpful to make a distinction between commits
> > which are "eligible for deconfliction" from commits that "the server
> knows
> > how to successfully deconflict". The proposal pertains to the former, but
> > I'm hoping the draft implementation I publish soon will help clarify the
> > latter. My plan is to focus on append-only commits first.
> >
> > Dmitri, as with all commits the server will write the final metadata.json
> > content. As for how it does so, it will vary a bit based on the nature of
> > the conflicting commits, but the general idea is that the commits have
> > consented to deconfliction by the server. So in Robert's example both
> > snapshots should be preserved, but the commits have given the server
> > license to decide which of those should be the *latest* snapshot after
> > deconfliction.
> >
> > --EM
> >
> > On Tue, Jul 8, 2025 at 9:13 AM Russell Spitzer <
> russell.spit...@gmail.com>
> > wrote:
> >
> > > For me there are two big use cases for this:
> > >
> > > 1. Simple overwrite.
> > > I may have several jobs, for example one that does TTL , the command it
> > > runs is idempotent and always deletes all rows / files before a certain
> > > point. The other is a merge/update command. In this situation I don't
> > even
> > > need to reconcile, I always want the merge to succeed because it is
> more
> > > expensive while my delete is very cheap and not that important. (A
> > > subsequent TTL delete run will remove the records so as long as I don't
> > > have constant conflicts everything is fine.)
> > >
> > > For example I may have a very complicated and long GDPR job that
> > modifies a
> > > lot of data and then a TTL job that removes rows before a certain point
> > in
> > > time (as well as appends and other things going on). The TTL Job can
> > always
> > > be sacrificed for the GDPR job.
> > >
> > > 2. Actual Deconfliction
> > > In this case I just want to do a cheap very of the deconfliction that
> > would
> > > eventually happen on the client side, i.e. without any filters. In case
> > of
> > > conflict I open up my previous manifest list and the one in the new
> > > TableUpdate and create a new Update that contains a combined manifest
> > list
> > > with the added entries from the conflicting manifest list in my
> current.
> > > This is more complicated to perform but we can do it relatively easily
> > for
> > > jobs that just modify the manifest list itself. For more complicated
> > > situations (a manifest which is being removed has already been removed)
> > we
> > > would have to actually dig into manifests or just fail. I would
> probably
> > > start by only tackling the case in which the manifest level
> modifications
> > > can be isolated and allowing the user to pass along the tag basically
> > > stating that the changes cannot conflict.
> > >
> > > Being able to start doing this kind of work on the Catalog side is
> going
> > to
> > > be important once we ever get Fine Grained Commits into the Iceberg
> Rest
> > > spec so experimenting with the ideas now I think is important.
> > >
> > > On Tue, Jul 8, 2025 at 9:21 AM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > Hi Eric,
> > > >
> > > > Just to clarify: Are you proposing Polaris Server to reconcile
> metadata
> > > > changes from two conflicting commit requests or are you proposing the
> > > > clients to do that?
> > > >
> > > > In other words, who will make the final metadata JSON? Clients (and
> the
> > > > server merely commits it) or the Polaris Server itself (modifying
> what
> > > the
> > > > clients submitted)?
> > > >
> > > > Sorry, I'm a bit lost on that point based on the current doc text.
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard <
> eric.w.mayn...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Wanted to share this short design doc
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms
> > > > > >
> > > > > for
> > > > > a simple method of allowing conflicting commits to both be
> committed.
> > > If
> > > > > implemented, this would allow e.g. two writers doing append-only
> > > > operations
> > > > > to a table in Polaris to always succeed.
> > > > >
> > > > > If you're interested, please take a look. In the meantime, I'll be
> > > > > preparing a small draft PR to serve as a reference implementation.
> > > > >
> > > > > --EM
> > > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] Commit Deconfliction

Reply via email to