Re: Discussion Regarding Events Instrumentation - GH PR #1904

Robert Stupp Thu, 17 Jul 2025 03:02:41 -0700

Actually, thinking a bit more about this, requiring that event
payloads have strict and direct dependencies on Iceberg or Polaris
types can be quite a disadvantage mid/long-term.
Having to adjust all event listener implementations just because
something in some unrelated type changed feels odd. While it might be
manageable with "just" Iceberg + Polaris types, I could foresee that
there'll be more. Mean, people are thinking about Delta,
unstructured/semi-structured data and more things. Having dependencies
to all these in a central/core API(!) could easily become messy.


Thoughts?

On Thu, Jul 17, 2025 at 9:26 AM Robert Stupp <sn...@snazy.de> wrote:
>
> Events, once they're "in the code base" are pretty much a "public
> facing API", which I think justifies careful handling and
> re-iteration, especially given the IRC work.
>
> With respect to the implementation, I have a few concerns regarding
> security and behavioral consistency.
>
> Secrets are published via events (e.g. from "rotateCredentials()").
> That should IMHO never happen.
>
> There's no way to let an event consumer associate an "after event"
> with a "before event". I propose to enable event listeners to do this
> in their respective implementation specific way, if they have to.
>
> The payload of the events in the PR is inconsistent. Some events
> provide the changes, others provide only the updated state, yet others
> provide the previous and updated state.
>
> The PR explicitly states that not all events will be useful. Which
> events are useful and which are not?
>
> Regarding "Iceberg table/view commited events": we already have the
> requirements and updates in the API request. That information would be
> very useful for events: Creating a "diff" from two full metadata
> objects is a) difficult and b) not necessary (with the changes).
>
> The amount of code in the PR feels to be a bit too much. For each
> event there are two Java records and two functions:
> * a record types for "before the fact"
> * a record types for "after the fact"
> * a function for "before the fact"
> * a function for "after the fact"
>
> I think we can get away with a much simpler code footprint (idea of
> such an interface below):
> One "before the fact" function in the event listener interface, which
> returns an "in flight" state. The latter has functions for "after the
> fact", which allows implementation to correlate an "after" with a
> "before".
> It also enables implementations to just go with "before" or just with
> "after" events or combine both. And only events that really need
> additional information for "after the fact" need to have a separate
> holder type.
>
> Thoughts?
>
>
>
> public interface EventListener {
>     /**
>      * "Dummy event foo" about to happen, with parameters x + y,
>      * no additional information collected for "after the fact".
>      */
>     InFlightVoid beforeEventFoo(String x, int y);
>
>     /**
>      * About to commit to an Iceberg table.
>      */
>     InFlight<IcebergTableCommitted> beforeIcebergTableCommit(
>             TableIdentifier tableIdentifier,
>             UpdateTableRequest updateTableRequest);
>
>     // `P` just for type safety across the before*() and after*() invocation.
>     interface InFlight<P> {
>         /** Called "after the fact". */
>         void success(P payload);
>         /** Potentially called "after the fact" in case of a failure. */
>         void failure(Throwable throwable);
>         /** Potentially called "after the fact" in case of a failure. */
>         void failure(String reason);
>     }
>
>     /** Special case of `EventId`  */
>     interface InFlightVoid extends InFlight<Void> {
>         default void success() {
>             success(null);
>         }
>     }
> }
>
> public record IcebergTableCommitted(
>         TableMetadata originalState,
>         TableMetadata newState) {}
>
>
> On Wed, Jun 25, 2025 at 10:50 PM Eric Maynard <eric.w.mayn...@gmail.com> 
> wrote:
> >
> > > Then we need to go back to square zero and ask ourselves what these
> > events are useful for.
> >
> > To start with, I would point to the design doc from January
> > <https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2>
> > and
> > this ML thread from December
> > <https://lists.apache.org/thread/03yz5wolkvy8l7rbcwjnqdq1bl8p065v>.
> > Ideally, the question you pose should have been resolved at that time. The
> > event system by itself is nice to have (we use it) as that linked design
> > argues.
> >
> > As things stand, we already have these events being fired. What we're now
> > talking about is a new listener implementation that takes these events and
> > persists them somewhere so you can have a record of which events were fired
> > at what time. If you accept that the events have some utility, persisting
> > them somewhere seems like a pretty reasonable thing to do -- though we can
> > get into more details about the implementation of that persistence.
> >
> > --EM
> >
> > On Wed, Jun 25, 2025 at 12:29 PM Alex Dutra <alex.du...@dremio.com.invalid>
> > wrote:
> >
> > > Hi,
> > >
> > > >  - AfterTableCommittedEvent tells you how many times the commit 
> > > > succeeded
> > > >   and the server attempted to return a response to the client.
> > >
> > > Not quite: in the current design, it tells you the number of times the
> > > commit succeeded, minus the number of times the commit succeeded, but
> > > the persistence of the after-event itself failed. Granted, the
> > > difference shouldn't be huge, but we cannot just assume that the
> > > number we get *is* the number of successful commits.
> > >
> > > > Wanting to know information about these events (how many times a client
> > > > attempted to commit to a table, or how many times the server attempted 
> > > > to
> > > > return a response about a table commit to a client) is a valid use case.
> > > We
> > > > use these events today to get metrics about the kinds of commits being
> > > made.
> > >
> > > Isn't that a better use case for telemetry? You can easily correlate
> > > spans, metrics & logs generated for each API call to get a clear
> > > picture of who attempted to do what. I'm not sure we need to persist
> > > events just to enable this use case.
> > >
> > > > The fact that these events don't tell you with perfect accuracy when
> > > > changes are made to the metastore does not by itself mean there is never
> > > > any value in persisting them. In fact, they were never meant to do so in
> > > > the first place.
> > >
> > > Then we need to go back to square zero and ask ourselves what these
> > > events are useful for. I had 3 use cases in mind: optimization, audit,
> > > and catalog federation. Only the first use case (optimization) imho
> > > could get away with weaker delivery guarantees. The last two require
> > > "exactly once" semantics.
> > >
> > > For comparison, Nessie events were added "after the fact" and are
> > > persisted separately from the commit, pretty much like what we have
> > > currently in Polaris. This is why they offer "at most once" guarantees
> > > and are only used for optimization purposes. (The advantage of Nesse
> > > though, is that it has a commit log, and thus can also fulfil audit
> > > use cases.)
> > >
> > > Thanks,
> > > Alex
> > >
> > > On Tue, Jun 24, 2025 at 9:52 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > > wrote:
> > > >
> > > > I think we may be making too many assumptions about the semantics of an
> > > > event being emitted and letting that paralyze us. If you're looking for
> > > an
> > > > event to be emitted *iff* a table change is committed to the metastore,
> > > the
> > > > current system simply does not support that use case:
> > > >
> > > >    - BeforeTableCommittedEvent tells you how many times a client made an
> > > >    API call that triggers a table commit and the call passed auth, etc.
> > > >
> > > >
> > > >    - AfterTableCommittedEvent tells you how many times the commit
> > > succeeded
> > > >    and the server attempted to return a response to the client.
> > > >
> > > >
> > > > Wanting to know information about these events (how many times a client
> > > > attempted to commit to a table, or how many times the server attempted 
> > > > to
> > > > return a response about a table commit to a client) is a valid use case.
> > > We
> > > > use these events today to get metrics about the kinds of commits being
> > > made.
> > > >
> > > > The fact that these events don't tell you with perfect accuracy when
> > > > changes are made to the metastore does not by itself mean there is never
> > > > any value in persisting them. In fact, they were never meant to do so in
> > > > the first place.
> > > >
> > > > We could certainly add a third event in the future --
> > > OnTableCommittedEvent
> > > > -- which has the kind of transactional semantics you're describing. Some
> > > > EventListeners, those that use the same DB as the metastore, could
> > > > transactionally handle this event and make changes to the metastore like
> > > > you describe. This would require significant code changes. Other
> > > > EventListeners, such as one that sends events into a message queue,
> > > > inherently won't be able to support that transactionality even if this
> > > > third event is added. I don't see this as a reason to block persisting
> > > the
> > > > existing events.
> > > >
> > > > --EM
> > > >
> > > > On Tue, Jun 24, 2025 at 1:27 AM Alex Dutra 
> > > > <alex.du...@dremio.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > To be honest, the decorator approach is nice to have, but I could live
> > > > > without it.
> > > > >
> > > > > What worries me the most right now is the design decision around
> > > > > "after" events. That a "before" event could exist without the
> > > > > corresponding "after" one, or even without the corresponding update to
> > > > > the metastore, is of concern.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > >
> > > > > On Mon, Jun 23, 2025 at 10:35 PM Adnan Hemani
> > > > > <adnan.hem...@snowflake.com.invalid> wrote:
> > > > > >
> > > > > > >
> > > > > > > I'm not sure the pros and cons about these two concerns were
> > > discussed.
> > > > > > > Those are general decisions that IMO require a bigger audience
> > > than a
> > > > > > > single code-level comment in one PR.
> > > > > >
> > > > > >
> > > > > > I understand where you're coming from. To keep things moving, I'm
> > > > > proposing
> > > > > > a one-week time limit on this discussion—unless there’s still active
> > > > > > engagement at that point. The goal is to avoid letting the
> > > conversation
> > > > > > drag on and continue blocking the PR if no new perspectives are 
> > > > > > being
> > > > > > shared beyond those who have already contributed. I’ll consider this
> > > time
> > > > > > limit in effect unless there’s explicit and valid pushback.
> > > > > >
> > > > > >  I do also prefer delegation over adding "all concerns to a single
> > > > > > > class". It simplifies testing and separates concerns, making
> > > > > > > failure-handling much easier, prevents leaking event-specifics
> > > into the
> > > > > > > actual implementation and also prevents new service
> > > methods/functions
> > > > > > > from not being decorated.
> > > > > >
> > > > > >
> > > > > > Got it, I think there are now two votes for this (you and Alex). In
> > > this
> > > > > > case, I can start work on this and modify the PR in the upcoming
> > > days.
> > > > > >
> > > > > > Not having any error handling leads to unnecessary ambiguity. Nobody
> > > can
> > > > > > > certainly reason about the cause why some "after event" is not
> > > there.
> > > > > > > There are legit reasons for and against having that. But again,
> > > it's a
> > > > > > > general and "far reaching" decision.
> > > > > >
> > > > > >
> > > > > > I've considered this perspective as well, but I believe that Events
> > > are
> > > > > > intentionally designed without error handling. If someone uses 
> > > > > > Events
> > > > > for a
> > > > > > valid case where throwing errors is appropriate - like adding extra
> > > > > > validation to an API call - introducing error handling could break
> > > that
> > > > > > pattern. So the real question is: is there a compelling reason to
> > > change
> > > > > > the current behavior and add error handling to Events? The
> > > responsibility
> > > > > > to prove that this change is clearly beneficial lies with those
> > > proposing
> > > > > > it.
> > > > > >
> > > > > > Best,
> > > > > > Adnan Hemani
> > > > > >
> > > > > > On Mon, Jun 23, 2025 at 3:33 AM Robert Stupp <sn...@snazy.de> wrote:
> > > > > >
> > > > > > > I'm not sure the pros and cons about these two concerns were
> > > discussed.
> > > > > > > Those are general decisions that IMO require a bigger audience
> > > than a
> > > > > > > single code-level comment in one PR.
> > > > > > >
> > > > > > > I do also prefer delegation over adding "all concerns to a single
> > > > > > > class". It simplifies testing and separates concerns, making
> > > > > > > failure-handling much easier, prevents leaking event-specifics
> > > into the
> > > > > > > actual implementation and also prevents new service
> > > methods/functions
> > > > > > > from not being decorated.
> > > > > > >
> > > > > > > Not having any error handling leads to unnecessary ambiguity.
> > > Nobody
> > > > > can
> > > > > > > certainly reason about the cause why some "after event" is not
> > > there.
> > > > > > > There are legit reasons for and against having that. But again,
> > > it's a
> > > > > > > general and "far reaching" decision.
> > > > > > >
> > > > > > >
> > > > > > > On 21.06.25 00:13, Adnan Hemani wrote:
> > > > > > > > Hi Robert,
> > > > > > > >
> > > > > > > > I’ve already responded to those concerns <
> > > > > > > https://github.com/apache/polaris/pull/1904#discussion_r2153824586>
> > > on
> > > > > > > the PR itself, prior to your comment asking for this email thread.
> > > > > Please
> > > > > > > take a look and feel free to respond either on the PR or this 
> > > > > > > email
> > > > > thread,
> > > > > > > now that we’ve already gotten it going.
> > > > > > > >
> > > > > > > > To be clear, I’m not opposed to taking feedback on either of
> > > these
> > > > > > > topics - but reiterating already responded-to threads with no new
> > > > > > > information/opinions/suggestions does not help drive the 
> > > > > > > discussion
> > > > > forward
> > > > > > > IMO.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Adnan Hemani
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Jun 20, 2025, at 7:29 AM, Robert Stupp <sn...@snazy.de>
> > > wrote:
> > > > > > > >>
> > > > > > > >> Thanks for bringing this up.
> > > > > > > >>
> > > > > > > >> I referenced the concerns that were mentioned on the PR about
> > > > > > > >> * the approach not using a `@Decorator`, mixing concerns.
> > > > > > > >> * exception/failure handling.
> > > > > > > >>
> > > > > > > >> For me, these are important topics that need to be considered
> > > in the
> > > > > > > PR. It would be good to respect those concerns before we start
> > > > > reviewing
> > > > > > > this big-ish change, as there are possibly options to shrink it
> > > > > (immutables
> > > > > > > as an option?) and also consider exception/failure handling.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On 19.06.25 04:48, Adnan Hemani wrote:
> > > > > > > >>> Hi all,
> > > > > > > >>>
> > > > > > > >>> As per the comment on GitHub PR #1904 <
> > > > > > >
> > > > >
> > > https://www.google.com/url?q=https://github.com/apache/polaris/pull/1904&source=gmail-imap&ust=1751034561000000&usg=AOvVaw0kTwOl-IaXvgc8vVItGrHI
> > > > > >,
> > > > > > > I am opening this mailing thread to discuss the PR implementation.
> > > > > This PR
> > > > > > > is only adding instrumentation for additional events in
> > > > > > > `PolarisServiceImpl.java`. There is no additional business logic 
> > > > > > > as
> > > > > part of
> > > > > > > this PR.
> > > > > > > >>>
> > > > > > > >>> There was one noted (and responded) remark about execution
> > > flow and
> > > > > > > one remark regarding cosmetic changes to the code, which is
> > > awaiting
> > > > > follow
> > > > > > > up by adutra@.
> > > > > > > >>>
> > > > > > > >>> I will let snazy@ take the floor from here as he requested
> > > this
> > > > > mail
> > > > > > > thread without being specific on his concerns on the PR comment.
> > > > > > > >>>
> > > > > > > >>> Best,
> > > > > > > >>> Adnan Hemani
> > > > > > > >
> > > > > > >
> > > > >
> > >

Re: Discussion Regarding Events Instrumentation - GH PR #1904

Reply via email to