Re: Discussion Regarding Events Instrumentation - GH PR #1904

Adnan Hemani Thu, 17 Jul 2025 16:14:54 -0700

Hi Robert,

Thanks for sharing your suggestions on this topic! I’ll respond in-line:



>> Secrets are published via events (e.g. from "rotateCredentials()").
>> That should IMHO never happen.

I see your comment on the PR about this - this is my oversight, it should only 
be returning Principal. Apologies about that, I will change it on the next 
revision of the PR.

>> There's no way to let an event consumer associate an "after event"
>> with a "before event". I propose to enable event listeners to do this
>> in their respective implementation specific way, if they have to.

I agree on this as well - the Polaris/Iceberg APIs should not have to handle 
logic to bind these events together for the event listener. One thing I’d like 
to point out is, I’ve introduced being able to access the `requestId` from the 
`PolarisCallContext` in a different PR. I further propose that event listener 
implementations can use that field to do whatever tracking and binding of 
events they’d like.

>> The payload of the events in the PR is inconsistent. Some events
>> provide the changes, others provide only the updated state, yet others
>> provide the previous and updated state.

Can you comment on the PR (or reply here) some of the events that provide both 
previous and updated states? My intention was that in a “before” event, you 
should be able to get what set of resources and modifying operations done upon 
these resources (if it is not clear from the API itself). “After” events should 
only show updated states.

>> The PR explicitly states that not all events will be useful. Which
>> events are useful and which are not?

I believe this is up to which event listener implementation(s) the user selects.

>> Regarding "Iceberg table/view commited events": we already have the
>> requirements and updates in the API request. That information would be
>> very useful for events: Creating a "diff" from two full metadata
>> objects is a) difficult and b) not necessary (with the changes).
>> 
>> The amount of code in the PR feels to be a bit too much. For each
>> event there are two Java records and two functions:
>> * a record types for "before the fact"
>> * a record types for "after the fact"
>> * a function for "before the fact"
>> * a function for "after the fact"
>> 
>> I think we can get away with a much simpler code footprint (idea of
>> such an interface below):
>> One "before the fact" function in the event listener interface, which
>> returns an "in flight" state. The latter has functions for "after the
>> fact", which allows implementation to correlate an "after" with a
>> "before".
>> It also enables implementations to just go with "before" or just with
>> "after" events or combine both. And only events that really need
>> additional information for "after the fact" need to have a separate
>> holder type.
>> 
>> Thoughts?


I understand this concern of too much code, it has most definitely been a 
marathon to write this PR :) I’m not sure how I feel about this proposal for a 
couple of reasons off the top of my head:
* Many events (i.e. most “update”/“insert" types of events) have different sets 
of information that we should emit on “before” and “after” events. So if we do 
go towards this proposal, we’ll have a split implementation of events between 
what we have today and the pattern proposed above. IS this what’s best for 
“Events” as a feature?
* Say users don’t want to have the same “after” event listener for all event 
types (e.g. they may choose that some events are relevant and I should 
alert/report on - and others that I don’t need to). This design would require 
multiple interface implementations for `InFlight`. Does that then actually help 
significantly in reducing the amount of code?

I’m still receptive to this idea - but I don’t think I see a good way to modify 
this idea and maintain a good user experience for anyone who wants to create a 
new event listener implementation. Any ideas?

> Actually, thinking a bit more about this, requiring that event
> payloads have strict and direct dependencies on Iceberg or Polaris
> types can be quite a disadvantage mid/long-term.
> Having to adjust all event listener implementations just because
> something in some unrelated type changed feels odd. While it might be
> manageable with "just" Iceberg + Polaris types, I could foresee that
> there'll be more. Mean, people are thinking about Delta,
> unstructured/semi-structured data and more things. Having dependencies
> to all these in a central/core API(!) could easily become messy.
> 
> Thoughts?

This is a good thought. I think there’s multiple ways to work with this:
* We could propose that all usages of non-Polaris/Iceberg POJOs are wrapped in 
a Polaris POJO before being used, in general, as this issue you’ve pointed out 
above will likely not be limited to just the Events functionality. In this way, 
we can ensure that any changes to non-Polaris/Iceberg data structures are 
solved in one place with the Polaris wrapper. 
* But say, we’d really like to solve this today for just events (along with the 
concerns of too much code above) - the straightforward way would be to remove 
all Event types and make a single generic event type that that takes in a map 
of information. It would be up to Event Listener implementations to parse the 
event in the way it sees fit. But, in my opinion, that reduces the usefulness 
and power of the events functionality significantly - so I cannot advocate for 
this option at this time.


Best,
Adnan Hemani


> On Jul 17, 2025, at 3:01 AM, Robert Stupp <[email protected]> wrote:
> 
> Actually, thinking a bit more about this, requiring that event
> payloads have strict and direct dependencies on Iceberg or Polaris
> types can be quite a disadvantage mid/long-term.
> Having to adjust all event listener implementations just because
> something in some unrelated type changed feels odd. While it might be
> manageable with "just" Iceberg + Polaris types, I could foresee that
> there'll be more. Mean, people are thinking about Delta,
> unstructured/semi-structured data and more things. Having dependencies
> to all these in a central/core API(!) could easily become messy.
> 
> Thoughts?
> 
> On Thu, Jul 17, 2025 at 9:26 AM Robert Stupp <[email protected]> wrote:
>> 
>> Events, once they're "in the code base" are pretty much a "public
>> facing API", which I think justifies careful handling and
>> re-iteration, especially given the IRC work.
>> 
>> With respect to the implementation, I have a few concerns regarding
>> security and behavioral consistency.
>> 
>> Secrets are published via events (e.g. from "rotateCredentials()").
>> That should IMHO never happen.
>> 
>> There's no way to let an event consumer associate an "after event"
>> with a "before event". I propose to enable event listeners to do this
>> in their respective implementation specific way, if they have to.
>> 
>> The payload of the events in the PR is inconsistent. Some events
>> provide the changes, others provide only the updated state, yet others
>> provide the previous and updated state.
>> 
>> The PR explicitly states that not all events will be useful. Which
>> events are useful and which are not?
>> 
>> Regarding "Iceberg table/view commited events": we already have the
>> requirements and updates in the API request. That information would be
>> very useful for events: Creating a "diff" from two full metadata
>> objects is a) difficult and b) not necessary (with the changes).
>> 
>> The amount of code in the PR feels to be a bit too much. For each
>> event there are two Java records and two functions:
>> * a record types for "before the fact"
>> * a record types for "after the fact"
>> * a function for "before the fact"
>> * a function for "after the fact"
>> 
>> I think we can get away with a much simpler code footprint (idea of
>> such an interface below):
>> One "before the fact" function in the event listener interface, which
>> returns an "in flight" state. The latter has functions for "after the
>> fact", which allows implementation to correlate an "after" with a
>> "before".
>> It also enables implementations to just go with "before" or just with
>> "after" events or combine both. And only events that really need
>> additional information for "after the fact" need to have a separate
>> holder type.
>> 
>> Thoughts?
>> 
>> 
>> 
>> public interface EventListener {
>>    /**
>>     * "Dummy event foo" about to happen, with parameters x + y,
>>     * no additional information collected for "after the fact".
>>     */
>>    InFlightVoid beforeEventFoo(String x, int y);
>> 
>>    /**
>>     * About to commit to an Iceberg table.
>>     */
>>    InFlight<IcebergTableCommitted> beforeIcebergTableCommit(
>>            TableIdentifier tableIdentifier,
>>            UpdateTableRequest updateTableRequest);
>> 
>>    // `P` just for type safety across the before*() and after*() invocation.
>>    interface InFlight<P> {
>>        /** Called "after the fact". */
>>        void success(P payload);
>>        /** Potentially called "after the fact" in case of a failure. */
>>        void failure(Throwable throwable);
>>        /** Potentially called "after the fact" in case of a failure. */
>>        void failure(String reason);
>>    }
>> 
>>    /** Special case of `EventId`  */
>>    interface InFlightVoid extends InFlight<Void> {
>>        default void success() {
>>            success(null);
>>        }
>>    }
>> }
>> 
>> public record IcebergTableCommitted(
>>        TableMetadata originalState,
>>        TableMetadata newState) {}
>> 
>> 
>> On Wed, Jun 25, 2025 at 10:50 PM Eric Maynard <[email protected]> 
>> wrote:
>>> 
>>>> Then we need to go back to square zero and ask ourselves what these
>>> events are useful for.
>>> 
>>> To start with, I would point to the design doc from January
>>> <https://www.google.com/url?q=https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab%3Dt.0%23heading%3Dh.8d519gwzsle2&source=gmail-imap&ust=1753351337000000&usg=AOvVaw25Wqr_2uuYMLupXozFFSxA>
>>> and
>>> this ML thread from December
>>> <https://www.google.com/url?q=https://lists.apache.org/thread/03yz5wolkvy8l7rbcwjnqdq1bl8p065v&source=gmail-imap&ust=1753351337000000&usg=AOvVaw0J87Lsp7W0XjFRNh3IVYHi>.
>>> Ideally, the question you pose should have been resolved at that time. The
>>> event system by itself is nice to have (we use it) as that linked design
>>> argues.
>>> 
>>> As things stand, we already have these events being fired. What we're now
>>> talking about is a new listener implementation that takes these events and
>>> persists them somewhere so you can have a record of which events were fired
>>> at what time. If you accept that the events have some utility, persisting
>>> them somewhere seems like a pretty reasonable thing to do -- though we can
>>> get into more details about the implementation of that persistence.
>>> 
>>> --EM
>>> 
>>> On Wed, Jun 25, 2025 at 12:29 PM Alex Dutra <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>>> - AfterTableCommittedEvent tells you how many times the commit succeeded
>>>>>  and the server attempted to return a response to the client.
>>>> 
>>>> Not quite: in the current design, it tells you the number of times the
>>>> commit succeeded, minus the number of times the commit succeeded, but
>>>> the persistence of the after-event itself failed. Granted, the
>>>> difference shouldn't be huge, but we cannot just assume that the
>>>> number we get *is* the number of successful commits.
>>>> 
>>>>> Wanting to know information about these events (how many times a client
>>>>> attempted to commit to a table, or how many times the server attempted to
>>>>> return a response about a table commit to a client) is a valid use case.
>>>> We
>>>>> use these events today to get metrics about the kinds of commits being
>>>> made.
>>>> 
>>>> Isn't that a better use case for telemetry? You can easily correlate
>>>> spans, metrics & logs generated for each API call to get a clear
>>>> picture of who attempted to do what. I'm not sure we need to persist
>>>> events just to enable this use case.
>>>> 
>>>>> The fact that these events don't tell you with perfect accuracy when
>>>>> changes are made to the metastore does not by itself mean there is never
>>>>> any value in persisting them. In fact, they were never meant to do so in
>>>>> the first place.
>>>> 
>>>> Then we need to go back to square zero and ask ourselves what these
>>>> events are useful for. I had 3 use cases in mind: optimization, audit,
>>>> and catalog federation. Only the first use case (optimization) imho
>>>> could get away with weaker delivery guarantees. The last two require
>>>> "exactly once" semantics.
>>>> 
>>>> For comparison, Nessie events were added "after the fact" and are
>>>> persisted separately from the commit, pretty much like what we have
>>>> currently in Polaris. This is why they offer "at most once" guarantees
>>>> and are only used for optimization purposes. (The advantage of Nesse
>>>> though, is that it has a commit log, and thus can also fulfil audit
>>>> use cases.)
>>>> 
>>>> Thanks,
>>>> Alex
>>>> 
>>>> On Tue, Jun 24, 2025 at 9:52 PM Eric Maynard <[email protected]>
>>>> wrote:
>>>>> 
>>>>> I think we may be making too many assumptions about the semantics of an
>>>>> event being emitted and letting that paralyze us. If you're looking for
>>>> an
>>>>> event to be emitted *iff* a table change is committed to the metastore,
>>>> the
>>>>> current system simply does not support that use case:
>>>>> 
>>>>>   - BeforeTableCommittedEvent tells you how many times a client made an
>>>>>   API call that triggers a table commit and the call passed auth, etc.
>>>>> 
>>>>> 
>>>>>   - AfterTableCommittedEvent tells you how many times the commit
>>>> succeeded
>>>>>   and the server attempted to return a response to the client.
>>>>> 
>>>>> 
>>>>> Wanting to know information about these events (how many times a client
>>>>> attempted to commit to a table, or how many times the server attempted to
>>>>> return a response about a table commit to a client) is a valid use case.
>>>> We
>>>>> use these events today to get metrics about the kinds of commits being
>>>> made.
>>>>> 
>>>>> The fact that these events don't tell you with perfect accuracy when
>>>>> changes are made to the metastore does not by itself mean there is never
>>>>> any value in persisting them. In fact, they were never meant to do so in
>>>>> the first place.
>>>>> 
>>>>> We could certainly add a third event in the future --
>>>> OnTableCommittedEvent
>>>>> -- which has the kind of transactional semantics you're describing. Some
>>>>> EventListeners, those that use the same DB as the metastore, could
>>>>> transactionally handle this event and make changes to the metastore like
>>>>> you describe. This would require significant code changes. Other
>>>>> EventListeners, such as one that sends events into a message queue,
>>>>> inherently won't be able to support that transactionality even if this
>>>>> third event is added. I don't see this as a reason to block persisting
>>>> the
>>>>> existing events.
>>>>> 
>>>>> --EM
>>>>> 
>>>>> On Tue, Jun 24, 2025 at 1:27 AM Alex Dutra <[email protected]
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> To be honest, the decorator approach is nice to have, but I could live
>>>>>> without it.
>>>>>> 
>>>>>> What worries me the most right now is the design decision around
>>>>>> "after" events. That a "before" event could exist without the
>>>>>> corresponding "after" one, or even without the corresponding update to
>>>>>> the metastore, is of concern.
>>>>>> 
>>>>>> Thanks,
>>>>>> Alex
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 23, 2025 at 10:35 PM Adnan Hemani
>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm not sure the pros and cons about these two concerns were
>>>> discussed.
>>>>>>>> Those are general decisions that IMO require a bigger audience
>>>> than a
>>>>>>>> single code-level comment in one PR.
>>>>>>> 
>>>>>>> 
>>>>>>> I understand where you're coming from. To keep things moving, I'm
>>>>>> proposing
>>>>>>> a one-week time limit on this discussion—unless there’s still active
>>>>>>> engagement at that point. The goal is to avoid letting the
>>>> conversation
>>>>>>> drag on and continue blocking the PR if no new perspectives are being
>>>>>>> shared beyond those who have already contributed. I’ll consider this
>>>> time
>>>>>>> limit in effect unless there’s explicit and valid pushback.
>>>>>>> 
>>>>>>> I do also prefer delegation over adding "all concerns to a single
>>>>>>>> class". It simplifies testing and separates concerns, making
>>>>>>>> failure-handling much easier, prevents leaking event-specifics
>>>> into the
>>>>>>>> actual implementation and also prevents new service
>>>> methods/functions
>>>>>>>> from not being decorated.
>>>>>>> 
>>>>>>> 
>>>>>>> Got it, I think there are now two votes for this (you and Alex). In
>>>> this
>>>>>>> case, I can start work on this and modify the PR in the upcoming
>>>> days.
>>>>>>> 
>>>>>>> Not having any error handling leads to unnecessary ambiguity. Nobody
>>>> can
>>>>>>>> certainly reason about the cause why some "after event" is not
>>>> there.
>>>>>>>> There are legit reasons for and against having that. But again,
>>>> it's a
>>>>>>>> general and "far reaching" decision.
>>>>>>> 
>>>>>>> 
>>>>>>> I've considered this perspective as well, but I believe that Events
>>>> are
>>>>>>> intentionally designed without error handling. If someone uses Events
>>>>>> for a
>>>>>>> valid case where throwing errors is appropriate - like adding extra
>>>>>>> validation to an API call - introducing error handling could break
>>>> that
>>>>>>> pattern. So the real question is: is there a compelling reason to
>>>> change
>>>>>>> the current behavior and add error handling to Events? The
>>>> responsibility
>>>>>>> to prove that this change is clearly beneficial lies with those
>>>> proposing
>>>>>>> it.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Adnan Hemani
>>>>>>> 
>>>>>>> On Mon, Jun 23, 2025 at 3:33 AM Robert Stupp <[email protected]> wrote:
>>>>>>> 
>>>>>>>> I'm not sure the pros and cons about these two concerns were
>>>> discussed.
>>>>>>>> Those are general decisions that IMO require a bigger audience
>>>> than a
>>>>>>>> single code-level comment in one PR.
>>>>>>>> 
>>>>>>>> I do also prefer delegation over adding "all concerns to a single
>>>>>>>> class". It simplifies testing and separates concerns, making
>>>>>>>> failure-handling much easier, prevents leaking event-specifics
>>>> into the
>>>>>>>> actual implementation and also prevents new service
>>>> methods/functions
>>>>>>>> from not being decorated.
>>>>>>>> 
>>>>>>>> Not having any error handling leads to unnecessary ambiguity.
>>>> Nobody
>>>>>> can
>>>>>>>> certainly reason about the cause why some "after event" is not
>>>> there.
>>>>>>>> There are legit reasons for and against having that. But again,
>>>> it's a
>>>>>>>> general and "far reaching" decision.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 21.06.25 00:13, Adnan Hemani wrote:
>>>>>>>>> Hi Robert,
>>>>>>>>> 
>>>>>>>>> I’ve already responded to those concerns <
>>>>>>>> https://www.google.com/url?q=https://github.com/apache/polaris/pull/1904%23discussion_r2153824586&source=gmail-imap&ust=1753351337000000&usg=AOvVaw3ikEMzciJdtZ_frZNq_VFj>
>>>> on
>>>>>>>> the PR itself, prior to your comment asking for this email thread.
>>>>>> Please
>>>>>>>> take a look and feel free to respond either on the PR or this email
>>>>>> thread,
>>>>>>>> now that we’ve already gotten it going.
>>>>>>>>> 
>>>>>>>>> To be clear, I’m not opposed to taking feedback on either of
>>>> these
>>>>>>>> topics - but reiterating already responded-to threads with no new
>>>>>>>> information/opinions/suggestions does not help drive the discussion
>>>>>> forward
>>>>>>>> IMO.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Adnan Hemani
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jun 20, 2025, at 7:29 AM, Robert Stupp <[email protected]>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks for bringing this up.
>>>>>>>>>> 
>>>>>>>>>> I referenced the concerns that were mentioned on the PR about
>>>>>>>>>> * the approach not using a `@Decorator`, mixing concerns.
>>>>>>>>>> * exception/failure handling.
>>>>>>>>>> 
>>>>>>>>>> For me, these are important topics that need to be considered
>>>> in the
>>>>>>>> PR. It would be good to respect those concerns before we start
>>>>>> reviewing
>>>>>>>> this big-ish change, as there are possibly options to shrink it
>>>>>> (immutables
>>>>>>>> as an option?) and also consider exception/failure handling.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 19.06.25 04:48, Adnan Hemani wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> As per the comment on GitHub PR #1904 <
>>>>>>>> 
>>>>>> 
>>>> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/pull/1904%26source%3Dgmail-imap%26ust%3D1751034561000000%26usg%3DAOvVaw0kTwOl-IaXvgc8vVItGrHI&source=gmail-imap&ust=1753351337000000&usg=AOvVaw042HJVstp6y0hPF16gF0TC
>>>>>>> ,
>>>>>>>> I am opening this mailing thread to discuss the PR implementation.
>>>>>> This PR
>>>>>>>> is only adding instrumentation for additional events in
>>>>>>>> `PolarisServiceImpl.java`. There is no additional business logic as
>>>>>> part of
>>>>>>>> this PR.
>>>>>>>>>>> 
>>>>>>>>>>> There was one noted (and responded) remark about execution
>>>> flow and
>>>>>>>> one remark regarding cosmetic changes to the code, which is
>>>> awaiting
>>>>>> follow
>>>>>>>> up by adutra@.
>>>>>>>>>>> 
>>>>>>>>>>> I will let snazy@ take the floor from here as he requested
>>>> this
>>>>>> mail
>>>>>>>> thread without being specific on his concerns on the PR comment.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Adnan Hemani
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>

Re: Discussion Regarding Events Instrumentation - GH PR #1904

Reply via email to