Thanks for looking into this, Adnan! Regardless of whether we go with emitting events only for successful commits or emitting phantom events with a separate "commit" events, I believe it is very important to clarify events behaviour with respect to REST API-side table changes (commits), and specifically in case of multi-table commits.
>From my side, I'm not attached to any particular implementation approach (my preference stated previously), but I do believe that event consumers should be able to reconstruct what actually changed (got committed) in the catalog with relative ease (i.e. not requiring extensive post-processing of events data). I'd propose to add a dedicated README or docs page for that... Ideally in the same PR that proposes a fix for this issue (so that the impl. code changes could be reviews with the intended semantics in mind). WDYT? Also, since it's going to be an API/SPI change in the end (I'm pretty sure). Let's reserve some extra review time for code and docs changes. I'd expect the design discussion may reopen once we have a concrete proposal in place. Thanks, Dmitri. On Fri, Nov 14, 2025 at 6:38 PM Adnan Hemani <[email protected]> wrote: > Hi all, > > Seems like this thread is going stale, so I'm reviving it to ensure we have > consensus on one side or another. > > Currently, the vote stands as follows (please correct me if I'm wrong): > * 1 vote for only emitting an event if the data is successfully committed > to the table [Dmitri] > * 1 vote for emitting a third type of event during a staged commit and then > emitting the final AfterTableCommitEvent at the very end [Eric] > * 2 votes indifferent [Adnan, Yufei] > > In either case, it seems we will need to ensure that AfterTableCommitEvent > only emits if the commit is applied rather than staged, so I will look into > that. But if anyone wants to break the tie here for emitting a third event > type for a "staged commit", please do so. Else, I can lean towards adding > the third event type and revert it out later, if needed. > > Best, > Adnan Hemani > > On Fri, Nov 7, 2025 at 11:39 AM Yufei Gu <[email protected]> wrote: > > > Agreed that exactly-once delivery is extremely hard to achieve in > > distributed systems because it requires atomic coordination across > > producers, brokers, and consumers, essentially a distributed transaction. > > Even frameworks that claim “exactly once” (like Kafka or Flink) rely on > > idempotent processing and transactional checkpoints rather than true > > one-time delivery. For Polaris, “at-least-once + idempotence” should be > > more than sufficient(for auditing) without adding unnecessary complexity. > > "Best-effort" is also acceptable for use cases like monitoring. > > > > Yufei > > > > > > On Fri, Nov 7, 2025 at 12:39 AM Adnan Hemani > > <[email protected]> wrote: > > > > > > From my POV, Polaris should deliver catalog-level events only for > > > committed data. > > > > > > Thank you for sharing this opinion, Dmitri! As I previously mentioned, > I > > > can see merits of both ways so I don't have a hard opinion or leaning > on > > > this topic. I'm happy to tally the votes on this thread instead :) > > > > > > > We may want to consider delivering events for Storage-side operations > > > performed by Polaris (e.g. when it writes a new file). These events may > > be > > > useful for driving async clean-up workflows. However, from my > > perspective, > > > these events would be distinct from the "catalog" events that currently > > > exist in Polaris. > > > > > > This is a good idea IMO. We should totally work on this! > > > > > > Best, > > > Adnan Hemani > > > > > > On Thu, Nov 6, 2025 at 11:29 AM Dmitri Bourlatchkov <[email protected]> > > > wrote: > > > > > > > It looks like this discussion is going in the direction where we need > > to > > > > define the meaning of "before" / "after" events and expectations for > > > their > > > > listener callbacks. > > > > > > > > Actually the latter point probably affects the former. If events are > > > > delivered only for committed data, I suppose the "before" / "after" > > > > distinction will become irrelevant. > > > > > > > > From my POV, Polaris should deliver catalog-level events only for > > > committed > > > > data. > > > > > > > > Additionally, we can probably be lax about always delivering events. > > That > > > > is to say that events for some changes may be lost. Note that, events > > may > > > > be lost even with the current state of the codebase and I generally > do > > > not > > > > think Polaris should venture into the "exactly once" delivery > > territory. > > > > > > > > We may want to consider delivering events for Storage-side operations > > > > performed by Polaris (e.g. when it writes a new file). These events > may > > > be > > > > useful for driving async clean-up workflows. However, from my > > > perspective, > > > > these events would be distinct from the "catalog" events that > currently > > > > exist in Polaris. > > > > > > > > WDYT? > > > > > > > > Cheers, > > > > Dmitri. > > > > > > > > On Wed, Nov 5, 2025 at 8:44 PM Adnan Hemani > > > > <[email protected]> wrote: > > > > > > > > > Eric's suggestion also makes sense to me from the perspective of > how > > > > > Events behave today - and it may actually be a lot easier to > > implement, > > > > > including adding a third event for when the transaction is > committed. > > > > > > > > > > I guess the question really comes down to: will it be confusing for > > > users > > > > > to see "AfterCommittedTableEvent" (with a parameter of "PENDING") > > even > > > > > though the transaction was rolled back? I can kind of see both > sides > > > > > here...would like to hear the rest of the community's thoughts on > > this. > > > > > > > > > > Best, > > > > > Adnan Hemani > > > > > > > > > > On Wed, Nov 5, 2025 at 3:27 PM Eric Maynard < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Option (2) is definitely more in line with the original design > for > > > > events > > > > > > -- we could add a third event when the transaction is committed, > > and > > > > then > > > > > > the onus is on either the listener impl or the consumer of the > > > listener > > > > > > impl's output (whatever it may be) to stitch together the events > > > based > > > > on > > > > > > the transaction ID or something else. > > > > > > > > > > > > --EM > > > > > > > > > > > > On Wed, Nov 5, 2025 at 2:54 PM Dmitri Bourlatchkov < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Adnan, > > > > > > > > > > > > > > I agree that this is a bug. Here's the GH issue: > > > > > > > https://github.com/apache/polaris/issues/2990 > > > > > > > > > > > > > > I personally do _not_ think the onus for fixing this is on you > :) > > > The > > > > > > code > > > > > > > pattern that led to this issue has existed in Apache Polaris > code > > > > since > > > > > > day > > > > > > > 1... but if you want to give it a try, please feel free. > > > > > > > > > > > > > > Option 1 from Yufei's email sounds like the right general > > approach > > > to > > > > > > > fixing this problem. I also agree that actual implementation > may > > be > > > > > > tricky. > > > > > > > It may be worth discussing an outline of a fix in this thread > > > before > > > > > > > starting a PR just to avoid rework in case there's no consensus > > on > > > > fix > > > > > > > details. I'm specifically concerned about the APIs used in this > > > case > > > > > and > > > > > > > how Iceberg catalog transactions map to atomic Persistence > > > operations > > > > > > (cf. > > > > > > > [1]). > > > > > > > > > > > > > > [1] > > > https://lists.apache.org/thread/rf5orxs815zs4h64p4rwp03q3pbgxb5r > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Wed, Nov 5, 2025 at 4:57 PM Adnan Hemani > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > Hi Dmitri, > > > > > > > > > > > > > > > > Yes, this is indeed a bug and we should create a GitHub issue > > for > > > > > this > > > > > > to > > > > > > > > track fixing it. Ideally, we should be doing something > similar > > to > > > > > > Option > > > > > > > 1 > > > > > > > > of what Yufei suggested - but how this will be done may be > > harder > > > > > than > > > > > > it > > > > > > > > looks. I can investigate this and I encourage others to look > > into > > > > it > > > > > as > > > > > > > > well, if this is of interest. > > > > > > > > > > > > > > > > Best, > > > > > > > > Adnan Hemani > > > > > > > > > > > > > > > > On Wed, Nov 5, 2025 at 8:29 AM Yufei Gu < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Dmitri, > > > > > > > > > > > > > > > > > > Good catch! This is a subtle but important issue. Thanks > for > > > > > raising > > > > > > > it. > > > > > > > > I > > > > > > > > > think we could handle it in a few ways: > > > > > > > > > > > > > > > > > > 1. Hold the event emission until the all multi-table commit > > > > > succeeds, > > > > > > > and > > > > > > > > > buffering per-table events until persistence confirms > > success. > > > > > > > > > 2. Include a transaction ID and status (e.g., pending, > > > committed, > > > > > > > > aborted) > > > > > > > > > in emitted events so consumers can filter accordingly. This > > > will > > > > > add > > > > > > > > burden > > > > > > > > > to downstreams. I think we could figure out a way to filter > > out > > > > > while > > > > > > > > > persisting them, so that the real consumers won't see the > > > events > > > > > with > > > > > > > > > aborted status. > > > > > > > > > > > > > > > > > > I'm not sure which way is better at this moment, we need to > > > take > > > > a > > > > > > deep > > > > > > > > > look to evaluate both. > > > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 5, 2025 at 8:06 AM Dmitri Bourlatchkov < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > I'd like to highlight an aspect of the current Events > > > behaviour > > > > > > with > > > > > > > > > > respect to multi-table commits that Christopher and I > > > > discovered > > > > > > > today. > > > > > > > > > The > > > > > > > > > > issue existed since the first Events implementation, > AFAIK, > > > it > > > > > just > > > > > > > did > > > > > > > > > not > > > > > > > > > > come into focus until now, AFAIK. > > > > > > > > > > > > > > > > > > > > Consider a multi-table commit request [1]. > > > > IcebergCatalogHandler > > > > > > will > > > > > > > > run > > > > > > > > > > each individual table through a separate commit operation > > on > > > > the > > > > > > base > > > > > > > > > > catalog [2]. The base catalog will issue events for each > > > table > > > > > > > > separately > > > > > > > > > > [3][4]. However, the overall commit to Polaris > Persistence > > > may > > > > > > still > > > > > > > > > fail, > > > > > > > > > > e.g. due to concurrent updates [5]. > > > > > > > > > > > > > > > > > > > > Now, we can have a situation when both before/after > events > > > are > > > > > > > > delivered > > > > > > > > > > for a table, but the actual change that triggered the > > events > > > is > > > > > > _not_ > > > > > > > > > > persisted, therefore does not exist in the current state > of > > > the > > > > > > > Polaris > > > > > > > > > > catalog. > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L973 > > > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L1051 > > > > > > > > > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L1405 > > > > > > > > > > > > > > > > > > > > [4] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L1558 > > > > > > > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/f934443114251f85d18c9a51ed61fc49a500a61a/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java#L1058 > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
