Re: Polaris Community Sync on Events

Robert Stupp Fri, 20 Jun 2025 12:01:07 -0700

Let me first second your point on frustration about long standingproposals - I completely feel that pain. I also think that it isfrustrating for reviewers when concerns are not addressed. But it isalso worth noting that getting to a consensus takes time. Gettingsomething into an OSS project can take a very long time. People havemany things on their plate, not only the "review of my particulareffort". This is very different from "single entity owned" and "closedsource" projects. It is important to keep that in mind.

As a project we should strive for solutions that our users can safelyuse without having to understand complex details. Adding more knobs thatusers must understand before they can use Polaris makes the adoption ofPolaris too difficult. The most successful projects and products do nothave any mandatory configuration options that require reading andunderstanding (lots of) documentation to get started, solely becausethings are self-explaining and easy to use.


Nobody objects the effort on having events in Polaris.

But there are strong and serious objections, not just from me, aroundthe technical approach. These objections have not been addressed.

On top: there is the tight relation to the Iceberg proposal havingserious implications to the persistence (writes and queries) of theseevents: the size of table/view metadata and updates to those, thepresence of these per-event attributes having a huge payload is requiredby the Iceberg proposal as it stands. You raised that concern as well.There is still no answer to or consensus on that yet.

Auditing is a mechanism to (later) inspect what happened, who did what,etc etc., which means that auditing has strong consistency and orderingrequirements. These requirements can, in general, not be met with theguarantees mentioned in the Iceberg proposal, and in particular not withthe proposed implementation.

Some questions I find very useful to think about before starting with atechnical approach:

* What happens if the process crashes?
* What happens during network hiccups?

* What happens to the server/disk/database/network/service if there a(too) many concurrent requests?

* What happens during and after a STW GC phase?
* How do things behave in a horizontally scaled setup?
* How do things behave in a geo-distributed setup?
* How do things recover from x/y/z?

There are more questions, but those are important detailed ones. Thoseare not "isolated" questions, but rather relate to each other. And thoseare just the lower level ones, not even the higher level ones like UX,use cases, SLAs, etc.

Building technically challenging things is often fun and probably thereason why we build code. Our users only want things to be "boring" inthe sense of "it just works". Our job is it to make the complex andcomplicated things boring.


Robert

PS: I can go into the nitty gritty details, but I think it's worth toconsider the above first.



On 19.06.25 04:39, Adnan Hemani wrote:

First of all, sorry all for the misformating in my previous email - seems my 
mail client wasn’t playing well with the Apache mail server.

Adding that there's been consensus in the meeting to start with a pure Java 
interface and go from there.

I’m not sure what this means - can you expand on this? I can't agree there’s a 
consensus on this unless we’re all clear as to what this actually means.

I'm not sure that the statement "Ordering guarantees are **only** possible ... event 
creation time" (emphasis mine) is correct.

This is in context of the implementation that was shared on the PR. In that 
context, there is no guarantee on ordering - and I don’t agree that there is a 
good reason why this is a hard requirement when it is still possible to show 
results in an ordered manner to users.

During the meeting I mentioned that I strongly believe that it is not a good 
idea to let users (as Apache Polaris we do NOT have customers) figure out 
constraints and limitations and issues on their own.

Your objection is noted; but event listeners are inherently configurable by 
users - and this level of “power-user features” is and always should be 
configurable by the user themselves. As part of my implementation, I’ve 
provided very conservative default values - but users are free to modify as 
they require. I don’t think there is anything wrong against providing 
reasonable defaults and then letting users decide based on their knowledge of 
their usage patterns.

Neither the Iceberg proposal nor PR 1844 are suitable for the auditing use case, because 
auditing is a _very_ different beast and implies way more requirements than "just 
store some data w/o any consistency guarantees".

Happy to discuss what other requirements are not being fulfilled here for 
auditing use cases. The way I look at it, this implementation provides the most 
reasonable, resilient implementation without creating a crazy amount of 
infrastructure for an out-of-the-box experience. An implementation involving 
Kafka/message broker to fire-and-forget events (and rely on Kafka to sink the 
events properly) would, of course, be much more resilient - but no longer works 
out of the box for Apache Polaris. And to underscore the above point - the 
choice of which event listener to use is again on the user. They may choose to 
use one of the implementations out of the box - or write one of their own - or 
not to enable events/auditing altogether. It should ultimately, always be, the 
user's choice.

There are still concerns mentioned in the Iceberg events proposal, with a huge impact to this 
effort. So I strongly believe that both efforts are very tightly coupled and not orthogonal efforts 
that could be handled independently. For example, I do not see how the "metadata payload size 
concern" is mitigated in PR 1844. We had the same discussion around "caching Iceberg 
metadata in Polaris".

This is something I’ve brought up in the Iceberg proposal as well and would 
like to see changed as well; but ultimately, this is not a change that forces a 
drastic change regardless of the outcome. We have a variety of workarounds on 
this particular topic (such as not storing the larger metadata payloads 
altogether and showing a null object to the user - or truncating the larger 
fields, etc.) I don’t find making a decision on this as  required for an MVP 
change - but if you feel that this is the case, please bring all topics that 
fit this criteria and I am happy to discuss and service them as required.

During the meeting some people raised concerns about the "buffering" that is strictly 
speaking not a necessity, but also introduced in 1844. That introduces additional consistency 
issues, additional risk of losing events and additional ordering issues. That is a very different 
problem than "just storing some data".

I believe this was mostly Alex that brought up this concern; as per a different 
mailing thread, we had discussed this already: 
https://lists.apache.org/thread/fqfsy03855rv3mwscol3qnxnf4xcnc3v. I still don’t 
agree that we can write back to the persistence on read-generated events as 
well on a per-call basis. But if that’s what it will take to get this PR 
merged, I’m happy to remove the buffer implementations and write back to 
persistence on every event generated. What I will not take responsibility for, 
in that case, is potential user complaints on latency and DB load once they 
enable writing back to the persistence (on either write-only or read-write 
event generation). Please let me know how you and Alex would like to proceed.

To also add context, the above Mailing List thread has been open for over a 
month detailing all of this and did not receive these concerns at any time 
regarding this. It is immensely frustrating that contributors follow all the 
processes recommended - yet still end up with the possibility of wasted efforts 
at the 11th hour.

Best,
Adnan Hemani

On Jun 18, 2025, at 3:40 AM, Robert Stupp <sn...@snazy.de> wrote:

Adding that there's been consensus in the meeting to start with a pure Java 
interface and go from there.

I'm not sure that the statement "Ordering guarantees are **only** possible ... event 
creation time" (emphasis mine) is correct.

During the meeting I mentioned that I strongly believe that it is not a good 
idea to let users (as Apache Polaris we do NOT have customers) figure out 
constraints and limitations and issues on their own.

Neither the Iceberg proposal nor PR 1844 are suitable for the auditing use case, because 
auditing is a _very_ different beast and implies way more requirements than "just 
store some data w/o any consistency guarantees".

There are still concerns mentioned in the Iceberg events proposal, with a huge impact to this 
effort. So I strongly believe that both efforts are very tightly coupled and not orthogonal efforts 
that could be handled independently. For example, I do not see how the "metadata payload size 
concern" is mitigated in PR 1844. We had the same discussion around "caching Iceberg 
metadata in Polaris".

During the meeting some people raised concerns about the "buffering" that is strictly 
speaking not a necessity, but also introduced in 1844. That introduces additional consistency 
issues, additional risk of losing events and additional ordering issues. That is a very different 
problem than "just storing some data".


On 17.06.25 21:16, Adnan Hemani wrote:

Hi everyone,

In lieu of a recording of today’s Community Sync on Events, I am posting some
notes regarding what was discussed:
What is the relationship between Iceberg Events API and Polaris Events, which are proposed
in
https://www.google.com/url?q=https://github.com/apache/polaris/pull/1844&source=gmail-imap&ust=1750848131000000&usg=AOvVaw37B6mpoGEqTLWUurMAUCzZ?
Persisting Polaris events are a pre-requisite of the Iceberg Events API - but
are not strictly tied to this. Users could find value in being able to persist
the Polaris Events without using the Iceberg Events API.
What Query Patterns are we expecting?
Going based on the assumption that the Iceberg Events API will be a primary
consumer of the Polaris Events and that it is almost finalized. The proposed
data schema for events is designed to work efficiently with the current state
of the Iceberg Events API.
What’s the Intended Use-Case?
This will go out in a different email later today under the original proposal
thread to ensure all context is in the same email thread.
If auditing is a potential use-case, then what guarantees are we able to
provide?
Ordering guarantees are only possible in that the event creation time is listed
with the Polaris Event. When querying Polaris Events from the database, we can
always sort events based on this timestamp.
Durability guarantees can be found in some implementations - but this is up to
the customer to choose which implementation they choose and how they’d like to
configure that implementation. All of these configurations are present in the
PR as it stands today.
A potential Kafka implementation may help with these concerns - but lacks an
end-to-end customer experience within Polaris and may be pushing the concerns
forward to Kafka rather than solving them. Unsure how this may work with
Iceberg Events API in the future.
Can the PR be broken up further?
Yes, it is possible - but unclear what parts are not necessary at this time.
Community to review and make suggestions on the PR.

Next Steps/Action Items:
Community: to review PR as it stands and provide high-level
recommendations/suggestions
Adnan Hemani: Send email regarding intended use cases.
Adnan Hemani: To respond to all reviews on PRs.

Please do respond to this email with anything I may have missed out on! Thanks
to everyone who was able to make it to this morning’s sync and for everyone’s
contributions :)

Best,
Adnan Hemani

On Jun 13, 2025, at 4:43 PM, Adnan Hemani <adnan.hem...@snowflake.com> wrote:

Hi all,

As we were not able to discuss at the previous community sync, I’m setting a quick sync 
early next week to discuss Events in Persistence (PR: 
https://www.google.com/url?q=https://github.com/apache/polaris/pull/1844&source=gmail-imap&ust=1750848131000000&usg=AOvVaw37B6mpoGEqTLWUurMAUCzZ).
 Everyone is welcome to join and discuss on next steps here. Thanks!

Best,
ADNAN HEMANI

Polaris Community Sync on Events
Tuesday, June 17 · 9:00 – 9:30am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: 
https://www.google.com/url?q=https://meet.google.com/ear-kiij-sur&source=gmail-imap&ust=1750848131000000&usg=AOvVaw1UuJNMFk1OBP1M37cak4Jy
Or dial: ‪(US) +1 402-410-2280‬ PIN: ‪350 919 847‬#
More phone numbers: 
https://www.google.com/url?q=https://tel.meet/ear-kiij-sur?pin%3D5036846369686&source=gmail-imap&ust=1750848131000000&usg=AOvVaw3C_51S6TE4965AiPLaVe1A

Re: Polaris Community Sync on Events

Reply via email to