Re: Discussion Regarding Events Instrumentation - GH PR #1904

Eric Maynard Wed, 25 Jun 2025 13:50:06 -0700

> Then we need to go back to square zero and ask ourselves what these
events are useful for.


To start with, I would point to the design doc from January
<https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2>
and
this ML thread from December
<https://lists.apache.org/thread/03yz5wolkvy8l7rbcwjnqdq1bl8p065v>.
Ideally, the question you pose should have been resolved at that time. The
event system by itself is nice to have (we use it) as that linked design
argues.

As things stand, we already have these events being fired. What we're now
talking about is a new listener implementation that takes these events and
persists them somewhere so you can have a record of which events were fired
at what time. If you accept that the events have some utility, persisting
them somewhere seems like a pretty reasonable thing to do -- though we can
get into more details about the implementation of that persistence.

--EM

On Wed, Jun 25, 2025 at 12:29 PM Alex Dutra <alex.du...@dremio.com.invalid>
wrote:

> Hi,
>
> >  - AfterTableCommittedEvent tells you how many times the commit succeeded
> >   and the server attempted to return a response to the client.
>
> Not quite: in the current design, it tells you the number of times the
> commit succeeded, minus the number of times the commit succeeded, but
> the persistence of the after-event itself failed. Granted, the
> difference shouldn't be huge, but we cannot just assume that the
> number we get *is* the number of successful commits.
>
> > Wanting to know information about these events (how many times a client
> > attempted to commit to a table, or how many times the server attempted to
> > return a response about a table commit to a client) is a valid use case.
> We
> > use these events today to get metrics about the kinds of commits being
> made.
>
> Isn't that a better use case for telemetry? You can easily correlate
> spans, metrics & logs generated for each API call to get a clear
> picture of who attempted to do what. I'm not sure we need to persist
> events just to enable this use case.
>
> > The fact that these events don't tell you with perfect accuracy when
> > changes are made to the metastore does not by itself mean there is never
> > any value in persisting them. In fact, they were never meant to do so in
> > the first place.
>
> Then we need to go back to square zero and ask ourselves what these
> events are useful for. I had 3 use cases in mind: optimization, audit,
> and catalog federation. Only the first use case (optimization) imho
> could get away with weaker delivery guarantees. The last two require
> "exactly once" semantics.
>
> For comparison, Nessie events were added "after the fact" and are
> persisted separately from the commit, pretty much like what we have
> currently in Polaris. This is why they offer "at most once" guarantees
> and are only used for optimization purposes. (The advantage of Nesse
> though, is that it has a commit log, and thus can also fulfil audit
> use cases.)
>
> Thanks,
> Alex
>
> On Tue, Jun 24, 2025 at 9:52 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
> >
> > I think we may be making too many assumptions about the semantics of an
> > event being emitted and letting that paralyze us. If you're looking for
> an
> > event to be emitted *iff* a table change is committed to the metastore,
> the
> > current system simply does not support that use case:
> >
> >    - BeforeTableCommittedEvent tells you how many times a client made an
> >    API call that triggers a table commit and the call passed auth, etc.
> >
> >
> >    - AfterTableCommittedEvent tells you how many times the commit
> succeeded
> >    and the server attempted to return a response to the client.
> >
> >
> > Wanting to know information about these events (how many times a client
> > attempted to commit to a table, or how many times the server attempted to
> > return a response about a table commit to a client) is a valid use case.
> We
> > use these events today to get metrics about the kinds of commits being
> made.
> >
> > The fact that these events don't tell you with perfect accuracy when
> > changes are made to the metastore does not by itself mean there is never
> > any value in persisting them. In fact, they were never meant to do so in
> > the first place.
> >
> > We could certainly add a third event in the future --
> OnTableCommittedEvent
> > -- which has the kind of transactional semantics you're describing. Some
> > EventListeners, those that use the same DB as the metastore, could
> > transactionally handle this event and make changes to the metastore like
> > you describe. This would require significant code changes. Other
> > EventListeners, such as one that sends events into a message queue,
> > inherently won't be able to support that transactionality even if this
> > third event is added. I don't see this as a reason to block persisting
> the
> > existing events.
> >
> > --EM
> >
> > On Tue, Jun 24, 2025 at 1:27 AM Alex Dutra <alex.du...@dremio.com.invalid
> >
> > wrote:
> >
> > > Hi,
> > >
> > > To be honest, the decorator approach is nice to have, but I could live
> > > without it.
> > >
> > > What worries me the most right now is the design decision around
> > > "after" events. That a "before" event could exist without the
> > > corresponding "after" one, or even without the corresponding update to
> > > the metastore, is of concern.
> > >
> > > Thanks,
> > > Alex
> > >
> > >
> > > On Mon, Jun 23, 2025 at 10:35 PM Adnan Hemani
> > > <adnan.hem...@snowflake.com.invalid> wrote:
> > > >
> > > > >
> > > > > I'm not sure the pros and cons about these two concerns were
> discussed.
> > > > > Those are general decisions that IMO require a bigger audience
> than a
> > > > > single code-level comment in one PR.
> > > >
> > > >
> > > > I understand where you're coming from. To keep things moving, I'm
> > > proposing
> > > > a one-week time limit on this discussion—unless there’s still active
> > > > engagement at that point. The goal is to avoid letting the
> conversation
> > > > drag on and continue blocking the PR if no new perspectives are being
> > > > shared beyond those who have already contributed. I’ll consider this
> time
> > > > limit in effect unless there’s explicit and valid pushback.
> > > >
> > > >  I do also prefer delegation over adding "all concerns to a single
> > > > > class". It simplifies testing and separates concerns, making
> > > > > failure-handling much easier, prevents leaking event-specifics
> into the
> > > > > actual implementation and also prevents new service
> methods/functions
> > > > > from not being decorated.
> > > >
> > > >
> > > > Got it, I think there are now two votes for this (you and Alex). In
> this
> > > > case, I can start work on this and modify the PR in the upcoming
> days.
> > > >
> > > > Not having any error handling leads to unnecessary ambiguity. Nobody
> can
> > > > > certainly reason about the cause why some "after event" is not
> there.
> > > > > There are legit reasons for and against having that. But again,
> it's a
> > > > > general and "far reaching" decision.
> > > >
> > > >
> > > > I've considered this perspective as well, but I believe that Events
> are
> > > > intentionally designed without error handling. If someone uses Events
> > > for a
> > > > valid case where throwing errors is appropriate - like adding extra
> > > > validation to an API call - introducing error handling could break
> that
> > > > pattern. So the real question is: is there a compelling reason to
> change
> > > > the current behavior and add error handling to Events? The
> responsibility
> > > > to prove that this change is clearly beneficial lies with those
> proposing
> > > > it.
> > > >
> > > > Best,
> > > > Adnan Hemani
> > > >
> > > > On Mon, Jun 23, 2025 at 3:33 AM Robert Stupp <sn...@snazy.de> wrote:
> > > >
> > > > > I'm not sure the pros and cons about these two concerns were
> discussed.
> > > > > Those are general decisions that IMO require a bigger audience
> than a
> > > > > single code-level comment in one PR.
> > > > >
> > > > > I do also prefer delegation over adding "all concerns to a single
> > > > > class". It simplifies testing and separates concerns, making
> > > > > failure-handling much easier, prevents leaking event-specifics
> into the
> > > > > actual implementation and also prevents new service
> methods/functions
> > > > > from not being decorated.
> > > > >
> > > > > Not having any error handling leads to unnecessary ambiguity.
> Nobody
> > > can
> > > > > certainly reason about the cause why some "after event" is not
> there.
> > > > > There are legit reasons for and against having that. But again,
> it's a
> > > > > general and "far reaching" decision.
> > > > >
> > > > >
> > > > > On 21.06.25 00:13, Adnan Hemani wrote:
> > > > > > Hi Robert,
> > > > > >
> > > > > > I’ve already responded to those concerns <
> > > > > https://github.com/apache/polaris/pull/1904#discussion_r2153824586>
> on
> > > > > the PR itself, prior to your comment asking for this email thread.
> > > Please
> > > > > take a look and feel free to respond either on the PR or this email
> > > thread,
> > > > > now that we’ve already gotten it going.
> > > > > >
> > > > > > To be clear, I’m not opposed to taking feedback on either of
> these
> > > > > topics - but reiterating already responded-to threads with no new
> > > > > information/opinions/suggestions does not help drive the discussion
> > > forward
> > > > > IMO.
> > > > > >
> > > > > > Best,
> > > > > > Adnan Hemani
> > > > > >
> > > > > >
> > > > > >> On Jun 20, 2025, at 7:29 AM, Robert Stupp <sn...@snazy.de>
> wrote:
> > > > > >>
> > > > > >> Thanks for bringing this up.
> > > > > >>
> > > > > >> I referenced the concerns that were mentioned on the PR about
> > > > > >> * the approach not using a `@Decorator`, mixing concerns.
> > > > > >> * exception/failure handling.
> > > > > >>
> > > > > >> For me, these are important topics that need to be considered
> in the
> > > > > PR. It would be good to respect those concerns before we start
> > > reviewing
> > > > > this big-ish change, as there are possibly options to shrink it
> > > (immutables
> > > > > as an option?) and also consider exception/failure handling.
> > > > > >>
> > > > > >>
> > > > > >> On 19.06.25 04:48, Adnan Hemani wrote:
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> As per the comment on GitHub PR #1904 <
> > > > >
> > >
> https://www.google.com/url?q=https://github.com/apache/polaris/pull/1904&source=gmail-imap&ust=1751034561000000&usg=AOvVaw0kTwOl-IaXvgc8vVItGrHI
> > > >,
> > > > > I am opening this mailing thread to discuss the PR implementation.
> > > This PR
> > > > > is only adding instrumentation for additional events in
> > > > > `PolarisServiceImpl.java`. There is no additional business logic as
> > > part of
> > > > > this PR.
> > > > > >>>
> > > > > >>> There was one noted (and responded) remark about execution
> flow and
> > > > > one remark regarding cosmetic changes to the code, which is
> awaiting
> > > follow
> > > > > up by adutra@.
> > > > > >>>
> > > > > >>> I will let snazy@ take the floor from here as he requested
> this
> > > mail
> > > > > thread without being specific on his concerns on the PR comment.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Adnan Hemani
> > > > > >
> > > > >
> > >
>

Re: Discussion Regarding Events Instrumentation - GH PR #1904

Reply via email to