Thanks for looking into this, Adnan!

Regardless of whether we go with emitting events only for successful
commits or emitting phantom events with a separate "commit" events, I
believe it is very important to clarify events behaviour with respect to
REST API-side table changes (commits), and specifically in case of
multi-table commits.

>From my side, I'm not attached to any particular implementation approach
(my preference stated previously), but I do believe that event consumers
should be able to reconstruct what actually changed (got committed) in the
catalog with relative ease (i.e. not requiring extensive post-processing of
events data).

I'd propose to add a dedicated README or docs page for that... Ideally in
the same PR that proposes a fix for this issue (so that the impl. code
changes could be reviews with the intended semantics in mind). WDYT?

Also, since it's going to be an API/SPI change in the end (I'm pretty
sure). Let's reserve some extra review time for code and docs changes. I'd
expect the design discussion may reopen once we have a concrete proposal in
place.

Thanks,
Dmitri.

On Fri, Nov 14, 2025 at 6:38 PM Adnan Hemani
<[email protected]> wrote:

> Hi all,
>
> Seems like this thread is going stale, so I'm reviving it to ensure we have
> consensus on one side or another.
>
> Currently, the vote stands as follows (please correct me if I'm wrong):
> * 1 vote for only emitting an event if the data is successfully committed
> to the table [Dmitri]
> * 1 vote for emitting a third type of event during a staged commit and then
> emitting the final AfterTableCommitEvent at the very end [Eric]
> * 2 votes indifferent [Adnan, Yufei]
>
> In either case, it seems we will need to ensure that AfterTableCommitEvent
> only emits if the commit is applied rather than staged, so I will look into
> that. But if anyone wants to break the tie here for emitting a third event
> type for a "staged commit", please do so. Else, I can lean towards adding
> the third event type and revert it out later, if needed.
>
> Best,
> Adnan Hemani
>
> On Fri, Nov 7, 2025 at 11:39 AM Yufei Gu <[email protected]> wrote:
>
> > Agreed that exactly-once delivery is extremely hard to achieve in
> > distributed systems because it requires atomic coordination across
> > producers, brokers, and consumers, essentially a distributed transaction.
> > Even frameworks that claim “exactly once” (like Kafka or Flink) rely on
> > idempotent processing and transactional checkpoints rather than true
> > one-time delivery. For Polaris, “at-least-once + idempotence” should be
> > more than sufficient(for auditing) without adding unnecessary complexity.
> > "Best-effort" is also acceptable for use cases like monitoring.
> >
> > Yufei
> >
> >
> > On Fri, Nov 7, 2025 at 12:39 AM Adnan Hemani
> > <[email protected]> wrote:
> >
> > > > From my POV, Polaris should deliver catalog-level events only for
> > > committed data.
> > >
> > > Thank you for sharing this opinion, Dmitri! As I previously mentioned,
> I
> > > can see merits of both ways so I don't have a hard opinion or leaning
> on
> > > this topic. I'm happy to tally the votes on this thread instead :)
> > >
> > > > We may want to consider delivering events for Storage-side operations
> > > performed by Polaris (e.g. when it writes a new file). These events may
> > be
> > > useful for driving async clean-up workflows. However, from my
> > perspective,
> > > these events would be distinct from the "catalog" events that currently
> > > exist in Polaris.
> > >
> > > This is a good idea IMO. We should totally work on this!
> > >
> > > Best,
> > > Adnan Hemani
> > >
> > > On Thu, Nov 6, 2025 at 11:29 AM Dmitri Bourlatchkov <[email protected]>
> > > wrote:
> > >
> > > > It looks like this discussion is going in the direction where we need
> > to
> > > > define the meaning of "before" / "after" events and expectations for
> > > their
> > > > listener callbacks.
> > > >
> > > > Actually the latter point probably affects the former. If events are
> > > > delivered only for committed data, I suppose the "before" / "after"
> > > > distinction will become irrelevant.
> > > >
> > > > From my POV, Polaris should deliver catalog-level events only for
> > > committed
> > > > data.
> > > >
> > > > Additionally, we can probably be lax about always delivering events.
> > That
> > > > is to say that events for some changes may be lost. Note that, events
> > may
> > > > be lost even with the current state of the codebase and I generally
> do
> > > not
> > > > think Polaris should venture into the "exactly once" delivery
> > territory.
> > > >
> > > > We may want to consider delivering events for Storage-side operations
> > > > performed by Polaris (e.g. when it writes a new file). These events
> may
> > > be
> > > > useful for driving async clean-up workflows. However, from my
> > > perspective,
> > > > these events would be distinct from the "catalog" events that
> currently
> > > > exist in Polaris.
> > > >
> > > > WDYT?
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > > On Wed, Nov 5, 2025 at 8:44 PM Adnan Hemani
> > > > <[email protected]> wrote:
> > > >
> > > > > Eric's suggestion also makes sense to me from the perspective of
> how
> > > > > Events behave today - and it may actually be a lot easier to
> > implement,
> > > > > including adding a third event for when the transaction is
> committed.
> > > > >
> > > > > I guess the question really comes down to: will it be confusing for
> > > users
> > > > > to see "AfterCommittedTableEvent" (with a parameter of "PENDING")
> > even
> > > > > though the transaction was rolled back? I can kind of see both
> sides
> > > > > here...would like to hear the rest of the community's thoughts on
> > this.
> > > > >
> > > > > Best,
> > > > > Adnan Hemani
> > > > >
> > > > > On Wed, Nov 5, 2025 at 3:27 PM Eric Maynard <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Option (2) is definitely more in line with the original design
> for
> > > > events
> > > > > > -- we could add a third event when the transaction is committed,
> > and
> > > > then
> > > > > > the onus is on either the listener impl or the consumer of the
> > > listener
> > > > > > impl's output (whatever it may be) to stitch together the events
> > > based
> > > > on
> > > > > > the transaction ID or something else.
> > > > > >
> > > > > > --EM
> > > > > >
> > > > > > On Wed, Nov 5, 2025 at 2:54 PM Dmitri Bourlatchkov <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Adnan,
> > > > > > >
> > > > > > > I agree that this is a bug. Here's the GH issue:
> > > > > > > https://github.com/apache/polaris/issues/2990
> > > > > > >
> > > > > > > I personally do _not_ think the onus for fixing this is on you
> :)
> > > The
> > > > > > code
> > > > > > > pattern that led to this issue has existed in Apache Polaris
> code
> > > > since
> > > > > > day
> > > > > > > 1... but if you want to give it a try, please feel free.
> > > > > > >
> > > > > > > Option 1 from Yufei's email sounds like the right general
> > approach
> > > to
> > > > > > > fixing this problem. I also agree that actual implementation
> may
> > be
> > > > > > tricky.
> > > > > > > It may be worth discussing an outline of a fix in this thread
> > > before
> > > > > > > starting a PR just to avoid rework in case there's no consensus
> > on
> > > > fix
> > > > > > > details. I'm specifically concerned about the APIs used in this
> > > case
> > > > > and
> > > > > > > how Iceberg catalog transactions map to atomic Persistence
> > > operations
> > > > > > (cf.
> > > > > > > [1]).
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/rf5orxs815zs4h64p4rwp03q3pbgxb5r
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > > On Wed, Nov 5, 2025 at 4:57 PM Adnan Hemani
> > > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > > Hi Dmitri,
> > > > > > > >
> > > > > > > > Yes, this is indeed a bug and we should create a GitHub issue
> > for
> > > > > this
> > > > > > to
> > > > > > > > track fixing it. Ideally, we should be doing something
> similar
> > to
> > > > > > Option
> > > > > > > 1
> > > > > > > > of what Yufei suggested - but how this will be done may be
> > harder
> > > > > than
> > > > > > it
> > > > > > > > looks. I can investigate this and I encourage others to look
> > into
> > > > it
> > > > > as
> > > > > > > > well, if this is of interest.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Adnan Hemani
> > > > > > > >
> > > > > > > > On Wed, Nov 5, 2025 at 8:29 AM Yufei Gu <
> [email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dmitri,
> > > > > > > > >
> > > > > > > > > Good catch! This is a subtle but important issue. Thanks
> for
> > > > > raising
> > > > > > > it.
> > > > > > > > I
> > > > > > > > > think we could handle it in a few ways:
> > > > > > > > >
> > > > > > > > > 1. Hold the event emission until the all multi-table commit
> > > > > succeeds,
> > > > > > > and
> > > > > > > > > buffering per-table events until persistence confirms
> > success.
> > > > > > > > > 2. Include a transaction ID and status (e.g., pending,
> > > committed,
> > > > > > > > aborted)
> > > > > > > > > in emitted events so consumers can filter accordingly. This
> > > will
> > > > > add
> > > > > > > > burden
> > > > > > > > > to downstreams. I think we could figure out a way to filter
> > out
> > > > > while
> > > > > > > > > persisting them, so that the real consumers won't see the
> > > events
> > > > > with
> > > > > > > > > aborted status.
> > > > > > > > >
> > > > > > > > > I'm not sure which way is better at this moment, we need to
> > > take
> > > > a
> > > > > > deep
> > > > > > > > > look to evaluate both.
> > > > > > > > >
> > > > > > > > > Yufei
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Nov 5, 2025 at 8:06 AM Dmitri Bourlatchkov <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > I'd like to highlight an aspect of the current Events
> > > behaviour
> > > > > > with
> > > > > > > > > > respect to multi-table commits that Christopher and I
> > > > discovered
> > > > > > > today.
> > > > > > > > > The
> > > > > > > > > > issue existed since the first Events implementation,
> AFAIK,
> > > it
> > > > > just
> > > > > > > did
> > > > > > > > > not
> > > > > > > > > > come into focus until now, AFAIK.
> > > > > > > > > >
> > > > > > > > > > Consider a multi-table commit request [1].
> > > > IcebergCatalogHandler
> > > > > > will
> > > > > > > > run
> > > > > > > > > > each individual table through a separate commit operation
> > on
> > > > the
> > > > > > base
> > > > > > > > > > catalog [2]. The base catalog will issue events for each
> > > table
> > > > > > > > separately
> > > > > > > > > > [3][4]. However, the overall commit to Polaris
> Persistence
> > > may
> > > > > > still
> > > > > > > > > fail,
> > > > > > > > > > e.g. due to concurrent updates [5].
> > > > > > > > > >
> > > > > > > > > > Now, we can have a situation when both before/after
> events
> > > are
> > > > > > > > delivered
> > > > > > > > > > for a table, but the actual change that triggered the
> > events
> > > is
> > > > > > _not_
> > > > > > > > > > persisted, therefore does not exist in the current state
> of
> > > the
> > > > > > > Polaris
> > > > > > > > > > catalog.
> > > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L973
> > > > > > > > > >
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L1051
> > > > > > > > > >
> > > > > > > > > > [3]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L1405
> > > > > > > > > >
> > > > > > > > > > [4]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L1558
> > > > > > > > > >
> > > > > > > > > > [5]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L1058
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Dmitri.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to