Re: Polaris Community Sync on Events

Adnan Hemani Wed, 25 Jun 2025 18:05:06 -0700

Hi all,

Thanks for the feedback. I've spent the last day or so trying to figure out
what this suggestion (persisting events in the same transaction) means in
terms of a potential implementation. Here's what I've got:
* Not all potential auditable actions go to the persistence (e.g. actions
on federated catalogs). What is the right way then to introduce
"transactional" event persistence here?
* To answer Alex's question: "read-based events" are auditable actions that
may not be writing anything to the persistence at all. Think of LoadTable
or even something as simple as ListCatalogs. Do we wish to introduce a
table-write (to the Events table) there as part of reading the entities
table? It should not be news that DB writes are more expensive in terms of
latency than reads typically. Is this increase of latency okay?
* (side note - pointing out ramification) Adding a before and after event
as part of a DB transaction isn't really useful - we should reduce to just
one event emitted after.
* (side note - pointing out ramification) We can no longer separate out
event logging from the general Polaris MetaStore in a transaction. Is the
additional load to the database concerning for any reason (size that the
events table may take up, additional traffic to the DB, etc.)?


There is also the implementation concern here with how to properly audit
calls to `runResolvePass()`, which is used to resolve entities. APIs such
as `LoadTable` rely really on this function for retrieving information from
the persistence. But this method is not called only as part of a handful of
APIs - it's present in most. So what's the right way that we can plumb
which method is calling to the persistence in this case?

Best,
Adnan Hemani

On Tue, Jun 24, 2025 at 1:22 AM Robert Stupp <sn...@snazy.de> wrote:

> Apache Polaris has users and not customers.
>
> Generally speaking, pushing commercial interests to an Apache project
> would be very concerning.
>
> Anything that goes into the production code base has to be considered as
> production code and not an example. Asking users to implement something
> different to be able to use a functionality for which an implementation
> already exists does not seem right.
>
> Your claim that there were no "reasonable suggestions" is not correct.
> The option for a much simpler approach was mentioned (by Alex IIRC) a
> while ago, but until now not considered. The option to go with just an
> interface without an implementation was proposed as well, but also not
> considered.
>
> The questions I mentioned in a previous reply were meant to think about
> the approaches in the PR to figure out issues about what can go wrong,
> adding technical statements about practically observed facts in my
> previous reply. I do not know what else I could say about these general
> issues.
>
>
> On 24.06.25 01:33, Adnan Hemani wrote:
> > Thanks for these concrete concerns. The way I see it is that the
> > file-buffer implementation is again, a customer-choice they can make. If
> > they are using ephemeral storage or more complicated persistent volumes,
> > they still have the ability to write their own event
> > listener implementation and use that instead. This sample implementation
> > does not lock users into using this approach.
> >
> > But on a related note, you raised many concerns in your previous email
> > regarding event durability for an auditing use case. If we are not able
> to
> > use a "commit-log" or "file-buffer" architecture due to the concerns in
> > this email, what is your proposed way to solve all of these concerns from
> > an out-of-the-box auditing functionality perspective? As Alex suggested
> on
> > the PR, going to persistence on every event is technically an option -
> but
> > I would assume there are heavy latency and/or reliability concerns there,
> > especially for read-based events. If there is not a heavy concern with
> > inserting event logs within the same transaction as a read/write query to
> > the persistence, then please clearly state that and I would be glad to
> > implement that instead. Additionally, what types of "necessary guard
> rails"
> > are you looking for when it comes to an on-heap buffer?
> >
> > "Tak[ing] a step back and rethink" is not really action-oriented and
> > doesn't help move the conversation forward - providing reasonable
> > suggestions that help resolve these concerns is what truly helps. Please
> > work with me to propose something that you wouldn't have concerns on.
> >
> > Best,
> > Adnan Hemani
> >
> > On Mon, Jun 23, 2025 at 3:15 AM Robert Stupp <sn...@snazy.de> wrote:
> >
> >> No need to apologize, but I think there are some quite important aspects
> >> that have to be considered. Going into every explicit and implicit
> >> detail is way too much for a single email, as it requires explaining
> >> operating system and file system behaviors.
> >>
> >> Since you asked for the "nitty gritty details":
> >>
> >> The theoretical assumption that any "file write()" is immediately
> >> persistent and can always be immediately and consistently read is just
> >> wrong. Files do get truncated, files can have garbage content ...
> >> nothing of that is handled. The fact that containers use ephemeral
> >> storage means that nothing can be read back. Using persistent volumes is
> >> another beast. And then it goes quite into file system performance and
> >> configuration territory, plus the availability and performance aspects
> >> of PVs. Other aspects like load-shedding, handling of file-system and
> >> disk failures are not handled at all.
> >>
> >> With the approach proposed in the PR, users have to understand all that.
> >>
> >> Your proposal wrt to big payload to "just return 'null'" would violate
> >> the current Iceberg events proposal - the fields for 'metadata' and
> >> 'updates' are mandatory. Truncation is also not an option as it would
> >> falsify/corrupt data.
> >>
> >> It's been mentioned on the PR to "take a step back and rethink". Let's
> >> please consider the feedback on the PR.
> >>
> >> Polaris today does _not_ require anything to be persisted locally - and
> >> having the clear separation of storage and compute is good. I am
> >> strongly against on adding anything that is a (file-system based)
> >> commit-log. I am also strongly against adding an on-heap buffer w/o
> >> adding the necessary guard rails.
> >>
> >> I hope the above helps not causing unnecessarily long review round
> trips.
> >>
> >>
> >> On 21.06.25 01:32, Adnan Hemani wrote:
> >>>> I also think that it is frustrating for reviewers when concerns are
> not
> >>> and
> >>>> But there are strong and serious objections, not just from me, around
> >>>> the technical approach. These objections have not been addressed.
> >>> I generally agree to these statements regarding frustration from
> >>> unresolved concerns/objections, as I also maintain other Open Source
> >>> projects - but I’d like to see any comments on any mailing thread or
> >>> PRs regarding this proposal where this is the case. I’ll apologize in
> >>> advance if I missed anything - but I believe, all objections had been
> >>> duly responded to and/or resolved in a prompt manner.
> >>>
> >>>> As a project we should strive for solutions that our users can safely
> >>>> use without having to understand complex details. Adding more knobs
> >>>> that users must understand before they can use Polaris makes the
> >>>> adoption of Polaris too difficult. The most successful projects and
> >>>> products do not have any mandatory configuration options that require
> >>>> reading and understanding (lots of) documentation to get started,
> >>>> solely because things are self-explaining and easy to use.
> >>> I agree with you here - /mandatory/ configuration options should
> >>> /not/ be required for this feature; they always increase the user’s
> >>> mental model complexity and should be used as sparingly as possible.
> >>> However, I am not introducing any /mandatory/ configuration options
> >>> here; configuring the one line of which Event Listener implementation
> >>> the user wants already exists. All configuration options regarding the
> >>> Event Listener implementations have conservative default values (as I
> >>> stated in the previous email) that only “power-users” will want to
> >>> tinker with. Providing additional /optional /configurations for users
> >>> rarely increase the general user’s mental model complexity as general
> >>> users will not need/use these options.
> >>>
> >>>> On top: there is the tight relation to the Iceberg proposal having
> >>>> serious implications to the persistence (writes and queries) of these
> >>>> events: the size of table/view metadata and updates to those, the
> >>>> presence of these per-event attributes having a huge payload is
> >>>> required by the Iceberg proposal as it stands. You raised that
> >>>> concern as well. There is still no answer to or consensus on that yet.
> >>> As I stated in my previous email, I believe we have identified
> >>> workarounds for this - and I don’t see that these workarounds are
> >>> entirely unreasonable. To be clear, let me list some of these
> >>> workarounds out in more details here and we can debate why none of
> >>> these workarounds ultimately can work for Polaris:
> >>> * Don’t store fields within payloads above a certain size, such as the
> >>> major concern of `TableMetadata` objects. Show back `null` to users
> >>> who query for this particular event.
> >>> * Truncate the specific fields within larger objects, such as the
> >>> `snapshot` field within `TableMetadata` objects if the overall object
> >>> size exceeds a certain size.
> >>>
> >>>> Auditing is a mechanism to (later) inspect what happened, who did
> >>>> what, etc etc., which means that auditing has strong consistency and
> >>>> ordering requirements. These requirements can, in general, not be met
> >>>> with the guarantees mentioned in the Iceberg proposal, and in
> >>>> particular not with the proposed implementation.
> >>> I’d like to peel this back a bit: what do you mean by "ordering
> >>> requirements"? As I mentioned in my previous email, ordering
> >>> requirements for showing events back to customers is still possible -
> >>> but not ordering guarantees for event ingestion. And I don’t see
> >>> ordering guarantees on event ingestion as a requirement and have still
> >>> not gotten any good reasoning why this should be the case.
> >>>
> >>> For consistency, I believe we have done the best we can with the
> >>> proposed implementation for an open-source deployment. I would
> >>> actually say it is almost impossible to guarantee SLAs or consistency
> >>> for any sort of open-source deployment where the open-source project
> >>> does not control all infrastructure underneath or around the
> >>> open-source project. It is exactly for this reason why many users go
> >>> towards managed deployments of open source projects where other
> >>> companies are paid for upkeep to all infrastructure and are on the
> >>> hook for providing such guarantees.
> >>>
> >>> But, I do not think this then means that there cannot be a
> >>> best-effort, out-of-the-box experience that an open-source project
> >>> gives as an optional feature. If you feel that there is some design
> >>> decision that I’ve made that has severe ramifications for the
> >>> customer, let’s definitely discuss those specifically rather than
> >>> general commentary that does not move this proposal forward.
> >>>
> >>>> Some questions I find very useful to think about before starting with
> >>>> a technical approach:
> >>>> * What happens if the process crashes?
> >>>> * What happens during network hiccups?
> >>>> * What happens to the server/disk/database/network/service if there a
> >>>> (too) many concurrent requests?
> >>>> * What happens during and after a STW GC phase?
> >>>> * How do things behave in a horizontally scaled setup?
> >>>> * How do things behave in a geo-distributed setup?
> >>>> * How do things recover from x/y/z?
> >>>> There are more questions, but those are important detailed ones.
> >>>> Those are not "isolated" questions, but rather relate to each other.
> >>>> And those are just the lower level ones, not even the higher level
> >>>> ones like UX, use cases, SLAs, etc.
> >>> Thanks for the specific questions here - I’m glad to answer all of
> >>> them in the context of the proposed PR that’s been introduced and
> >>> specifically with the file-based buffer implementation. The in-memory
> >>> buffer implementation obviously has large drawbacks for consistency
> >>> but is there to serve users who absolutely will not accept any writing
> >>> to the disk but would still like best-effort events.
> >>>> * What happens if the process crashes?
> >>> Events remain in the file buffers and will be flushed to persistence
> >>> once Polaris and/or the Persistence recover.
> >>>> * What happens during network hiccups?
> >>>> * What happens to the server/disk/database/network/service if there a
> >>>> (too) many concurrent requests?
> >>> File buffers will not delete the buffered events until they have been
> >>> successfully flushed to persistence. No events will be lost and will
> >>> continue to be retried. In the case of incoming events being
> >>> generated, they are simply written onto disk and the optional customer
> >>> configurations to dump to persistence after a certain amount of events
> >>> will help bail users out in case too many events are generated too
> >>> quickly.
> >>>> * What happens during and after a STW GC phase?
> >>> In the current implementation, the same thing that would happen to the
> >>> service as a whole - everything pauses. The threads running as part of
> >>> the file-buffer event listener will also be paused and will resume
> >>> when the JVM allows it to alongside the rest of the service.
> >>>> * How do things behave in a horizontally scaled setup?
> >>>> * How do things behave in a geo-distributed setup?
> >>> Each buffer event listener operates only on its own Polaris instance.
> >>> Each buffer event listener is, as a result, also responsible for only
> >>> its own set of buffers.  As long as it can connect to the persistence
> >>> instance that it is supposed to communicate (which is a base
> >>> requirement for the entire Polaris service in general), there are no
> >>> issues in managing a horizontally-scaled or geo-distributed set of
> >>> buffers. However, this does give me one good idea - what do we do in
> >>> case this particular Polaris instance shuts down? I believe we can add
> >>> a shut-down cleanup for these buffers. Please make a comment on the PR
> >>> regarding this and I can begin investigation on this.
> >>>
> >>>> PS: I can go into the nitty gritty details, but I think it's worth to
> >>>> consider the above first.
> >>> I believe I’ve responded to all the above questions/concerns. Please
> >>> do go deeper into the details - I’d prefer as little conversation
> >>> latency as possible, so please do list all concerns as thoroughly as
> >>> you can. Going through fractional concerns little-by-little will only
> >>> make our time to resolve concerns unnecessarily longer.
> >>>
> >>> Best,
> >>> Adnan Hemani
> >>>
> >>>> On Jun 20, 2025, at 6:01 AM, Robert Stupp <sn...@snazy.de> wrote:
> >>>>
> >>>> Let me first second your point on frustration about long standing
> >>>> proposals - I completely feel that pain. I also think that it is
> >>>> frustrating for reviewers when concerns are not addressed. But it is
> >>>> also worth noting that getting to a consensus takes time. Getting
> >>>> something into an OSS project can take a very long time. People have
> >>>> many things on their plate, not only the "review of my particular
> >>>> effort". This is very different from "single entity owned" and
> >>>> "closed source" projects. It is important to keep that in mind.
> >>>>
> >>>> As a project we should strive for solutions that our users can safely
> >>>> use without having to understand complex details. Adding more knobs
> >>>> that users must understand before they can use Polaris makes the
> >>>> adoption of Polaris too difficult. The most successful projects and
> >>>> products do not have any mandatory configuration options that require
> >>>> reading and understanding (lots of) documentation to get started,
> >>>> solely because things are self-explaining and easy to use.
> >>>>
> >>>> Nobody objects the effort on having events in Polaris.
> >>>>
> >>>> But there are strong and serious objections, not just from me, around
> >>>> the technical approach. These objections have not been addressed.
> >>>>
> >>>> On top: there is the tight relation to the Iceberg proposal having
> >>>> serious implications to the persistence (writes and queries) of these
> >>>> events: the size of table/view metadata and updates to those, the
> >>>> presence of these per-event attributes having a huge payload is
> >>>> required by the Iceberg proposal as it stands. You raised that
> >>>> concern as well. There is still no answer to or consensus on that yet.
> >>>>
> >>>> Auditing is a mechanism to (later) inspect what happened, who did
> >>>> what, etc etc., which means that auditing has strong consistency and
> >>>> ordering requirements. These requirements can, in general, not be met
> >>>> with the guarantees mentioned in the Iceberg proposal, and in
> >>>> particular not with the proposed implementation.
> >>>>
> >>>> Some questions I find very useful to think about before starting with
> >>>> a technical approach:
> >>>> * What happens if the process crashes?
> >>>> * What happens during network hiccups?
> >>>> * What happens to the server/disk/database/network/service if there a
> >>>> (too) many concurrent requests?
> >>>> * What happens during and after a STW GC phase?
> >>>> * How do things behave in a horizontally scaled setup?
> >>>> * How do things behave in a geo-distributed setup?
> >>>> * How do things recover from x/y/z?
> >>>> There are more questions, but those are important detailed ones.
> >>>> Those are not "isolated" questions, but rather relate to each other.
> >>>> And those are just the lower level ones, not even the higher level
> >>>> ones like UX, use cases, SLAs, etc.
> >>>>
> >>>> Building technically challenging things is often fun and probably the
> >>>> reason why we build code. Our users only want things to be "boring"
> >>>> in the sense of "it just works". Our job is it to make the complex
> >>>> and complicated things boring.
> >>>>
> >>>> Robert
> >>>>
> >>>> PS: I can go into the nitty gritty details, but I think it's worth to
> >>>> consider the above first.
> >>>>
> >>>>
> >>>> On 19.06.25 04:39, Adnan Hemani wrote:
> >>>>> First of all, sorry all for the misformating in my previous email -
> >>>>> seems my mail client wasn’t playing well with the Apache mail server.
> >>>>>
> >>>>>> Adding that there's been consensus in the meeting to start with a
> >>>>>> pure Java interface and go from there.
> >>>>> I’m not sure what this means - can you expand on this? I can't agree
> >>>>> there’s a consensus on this unless we’re all clear as to what this
> >>>>> actually means.
> >>>>>
> >>>>>> I'm not sure that the statement "Ordering guarantees are **only**
> >>>>>> possible ... event creation time" (emphasis mine) is correct.
> >>>>> This is in context of the implementation that was shared on the PR.
> >>>>> In that context, there is no guarantee on ordering - and I don’t
> >>>>> agree that there is a good reason why this is a hard requirement
> >>>>> when it is still possible to show results in an ordered manner to
> >> users.
> >>>>>> During the meeting I mentioned that I strongly believe that it is
> >>>>>> not a good idea to let users (as Apache Polaris we do NOT have
> >>>>>> customers) figure out constraints and limitations and issues on
> >>>>>> their own.
> >>>>> Your objection is noted; but event listeners are inherently
> >>>>> configurable by users - and this level of “power-user features” is
> >>>>> and always should be configurable by the user themselves. As part of
> >>>>> my implementation, I’ve provided very conservative default values -
> >>>>> but users are free to modify as they require. I don’t think there is
> >>>>> anything wrong against providing reasonable defaults and then
> >>>>> letting users decide based on their knowledge of their usage
> patterns.
> >>>>>
> >>>>>> Neither the Iceberg proposal nor PR 1844 are suitable for the
> >>>>>> auditing use case, because auditing is a _very_ different beast and
> >>>>>> implies way more requirements than "just store some data w/o any
> >>>>>> consistency guarantees".
> >>>>> Happy to discuss what other requirements are not being fulfilled
> >>>>> here for auditing use cases. The way I look at it, this
> >>>>> implementation provides the most reasonable, resilient
> >>>>> implementation without creating a crazy amount of infrastructure for
> >>>>> an out-of-the-box experience. An implementation involving
> >>>>> Kafka/message broker to fire-and-forget events (and rely on Kafka to
> >>>>> sink the events properly) would, of course, be much more resilient -
> >>>>> but no longer works out of the box for Apache Polaris. And to
> >>>>> underscore the above point - the choice of which event listener to
> >>>>> use is again on the user. They may choose to use one of the
> >>>>> implementations out of the box - or write one of their own - or not
> >>>>> to enable events/auditing altogether. It should ultimately, always
> >>>>> be, the user's choice.
> >>>>>
> >>>>>> There are still concerns mentioned in the Iceberg events proposal,
> >>>>>> with a huge impact to this effort. So I strongly believe that both
> >>>>>> efforts are very tightly coupled and not orthogonal efforts that
> >>>>>> could be handled independently. For example, I do not see how the
> >>>>>> "metadata payload size concern" is mitigated in PR 1844. We had the
> >>>>>> same discussion around "caching Iceberg metadata in Polaris".
> >>>>> This is something I’ve brought up in the Iceberg proposal as well
> >>>>> and would like to see changed as well; but ultimately, this is not a
> >>>>> change that forces a drastic change regardless of the outcome. We
> >>>>> have a variety of workarounds on this particular topic (such as not
> >>>>> storing the larger metadata payloads altogether and showing a null
> >>>>> object to the user - or truncating the larger fields, etc.) I don’t
> >>>>> find making a decision on this as  required for an MVP change - but
> >>>>> if you feel that this is the case, please bring all topics that fit
> >>>>> this criteria and I am happy to discuss and service them as required.
> >>>>>
> >>>>>> During the meeting some people raised concerns about the
> >>>>>> "buffering" that is strictly speaking not a necessity, but also
> >>>>>> introduced in 1844. That introduces additional consistency issues,
> >>>>>> additional risk of losing events and additional ordering issues.
> >>>>>> That is a very different problem than "just storing some data".
> >>>>> I believe this was mostly Alex that brought up this concern; as per
> >>>>> a different mailing thread, we had discussed this
> >>>>> already:
> >>
> https://www.google.com/url?q=https://lists.apache.org/thread/fqfsy03855rv3mwscol3qnxnf4xcnc3v&source=gmail-imap&ust=1751029395000000&usg=AOvVaw3aogakcQ4pmRcpduveArNI
> >>>>> <
> >>
> https://www.google.com/url?q=https://lists.apache.org/thread/fqfsy03855rv3mwscol3qnxnf4xcnc3v&source=gmail-imap&ust=1751029395000000&usg=AOvVaw3aogakcQ4pmRcpduveArNI
> >.
> >>
> >>>>> I still don’t agree that we can write back to the persistence on
> >>>>> read-generated events as well on a per-call basis. But if that’s
> >>>>> what it will take to get this PR merged, I’m happy to remove the
> >>>>> buffer implementations and write back to persistence on every event
> >>>>> generated. What I will not take responsibility for, in that case, is
> >>>>> potential user complaints on latency and DB load once they enable
> >>>>> writing back to the persistence (on either write-only or read-write
> >>>>> event generation). Please let me know how you and Alex would like to
> >>>>> proceed.
> >>>>>
> >>>>> To also add context, the above Mailing List thread has been open for
> >>>>> over a month detailing all of this and did not receive these
> >>>>> concerns at any time regarding this. It is immensely frustrating
> >>>>> that contributors follow all the processes recommended - yet still
> >>>>> end up with the possibility of wasted efforts at the 11th hour.
> >>>>>
> >>>>> Best,
> >>>>> Adnan Hemani
> >>>>>
> >>>>>> On Jun 18, 2025, at 3:40 AM, Robert Stupp <sn...@snazy.de> wrote:
> >>>>>>
> >>>>>> Adding that there's been consensus in the meeting to start with a
> >>>>>> pure Java interface and go from there.
> >>>>>>
> >>>>>> I'm not sure that the statement "Ordering guarantees are **only**
> >>>>>> possible ... event creation time" (emphasis mine) is correct.
> >>>>>>
> >>>>>> During the meeting I mentioned that I strongly believe that it is
> >>>>>> not a good idea to let users (as Apache Polaris we do NOT have
> >>>>>> customers) figure out constraints and limitations and issues on
> >>>>>> their own.
> >>>>>>
> >>>>>> Neither the Iceberg proposal nor PR 1844 are suitable for the
> >>>>>> auditing use case, because auditing is a _very_ different beast and
> >>>>>> implies way more requirements than "just store some data w/o any
> >>>>>> consistency guarantees".
> >>>>>>
> >>>>>> There are still concerns mentioned in the Iceberg events proposal,
> >>>>>> with a huge impact to this effort. So I strongly believe that both
> >>>>>> efforts are very tightly coupled and not orthogonal efforts that
> >>>>>> could be handled independently. For example, I do not see how the
> >>>>>> "metadata payload size concern" is mitigated in PR 1844. We had the
> >>>>>> same discussion around "caching Iceberg metadata in Polaris".
> >>>>>>
> >>>>>> During the meeting some people raised concerns about the
> >>>>>> "buffering" that is strictly speaking not a necessity, but also
> >>>>>> introduced in 1844. That introduces additional consistency issues,
> >>>>>> additional risk of losing events and additional ordering issues.
> >>>>>> That is a very different problem than "just storing some data".
> >>>>>>
> >>>>>>
> >>>>>> On 17.06.25 21:16, Adnan Hemani wrote:
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> In lieu of a recording of today’s Community Sync on Events, I am
> >>>>>>> posting some notes regarding what was discussed:
> >>>>>>> What is the relationship between Iceberg Events API and Polaris
> >>>>>>> Events, which are proposed
> >>>>>>> inhttps://
> >>
> www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/pull/1844%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw37B6mpoGEqTLWUurMAUCzZ&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2BRYRRCSwzwpx8jP9ckdDR
> ?
> >>
> >>>>>>> <
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/pull/1844%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw37B6mpoGEqTLWUurMAUCzZ&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2BRYRRCSwzwpx8jP9ckdDR
> >> ?>
> >>>>>>> Persisting Polaris events are a pre-requisite of the Iceberg
> >>>>>>> Events API - but are not strictly tied to this. Users could find
> >>>>>>> value in being able to persist the Polaris Events without using
> >>>>>>> the Iceberg Events API.
> >>>>>>> What Query Patterns are we expecting?
> >>>>>>> Going based on the assumption that the Iceberg Events API will be
> >>>>>>> a primary consumer of the Polaris Events and that it is almost
> >>>>>>> finalized. The proposed data schema for events is designed to work
> >>>>>>> efficiently with the current state of the Iceberg Events API.
> >>>>>>> What’s the Intended Use-Case?
> >>>>>>> This will go out in a different email later today under the
> >>>>>>> original proposal thread to ensure all context is in the same
> >>>>>>> email thread.
> >>>>>>> If auditing is a potential use-case, then what guarantees are we
> >>>>>>> able to provide?
> >>>>>>> Ordering guarantees are only possible in that the event creation
> >>>>>>> time is listed with the Polaris Event. When querying Polaris
> >>>>>>> Events from the database, we can always sort events based on this
> >>>>>>> timestamp.
> >>>>>>> Durability guarantees can be found in some implementations - but
> >>>>>>> this is up to the customer to choose which implementation they
> >>>>>>> choose and how they’d like to configure that implementation. All
> >>>>>>> of these configurations are present in the PR as it stands today.
> >>>>>>> A potential Kafka implementation may help with these concerns -
> >>>>>>> but lacks an end-to-end customer experience within Polaris and may
> >>>>>>> be pushing the concerns forward to Kafka rather than solving them.
> >>>>>>> Unsure how this may work with Iceberg Events API in the future.
> >>>>>>> Can the PR be broken up further?
> >>>>>>> Yes, it is possible - but unclear what parts are not necessary at
> >>>>>>> this time. Community to review and make suggestions on the PR.
> >>>>>>>
> >>>>>>> Next Steps/Action Items:
> >>>>>>> Community: to review PR as it stands and provide high-level
> >>>>>>> recommendations/suggestions
> >>>>>>> Adnan Hemani: Send email regarding intended use cases.
> >>>>>>> Adnan Hemani: To respond to all reviews on PRs.
> >>>>>>>
> >>>>>>> Please do respond to this email with anything I may have missed
> >>>>>>> out on! Thanks to everyone who was able to make it to this
> >>>>>>> morning’s sync and for everyone’s contributions :)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Adnan Hemani
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jun 13, 2025, at 4:43 PM, Adnan Hemani
> >>>>>>>> <adnan.hem...@snowflake.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> As we were not able to discuss at the previous community sync,
> >>>>>>>> I’m setting a quick sync early next week to discuss Events in
> >>>>>>>> Persistence
> >>>>>>>> (PR:
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/pull/1844%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw37B6mpoGEqTLWUurMAUCzZ&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2BRYRRCSwzwpx8jP9ckdDR
> >>>>>>>> <
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/pull/1844%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw37B6mpoGEqTLWUurMAUCzZ&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2BRYRRCSwzwpx8jP9ckdDR
> >).
> >>
> >>>>>>>> Everyone is welcome to join and discuss on next steps here.
> Thanks!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> ADNAN HEMANI
> >>>>>>>>
> >>>>>>>> Polaris Community Sync on Events
> >>>>>>>> Tuesday, June 17 · 9:00 – 9:30am
> >>>>>>>> Time zone: America/Los_Angeles
> >>>>>>>> Google Meet joining info
> >>>>>>>> Video call
> >>>>>>>> link:
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://meet.google.com/ear-kiij-sur%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw1UuJNMFk1OBP1M37cak4Jy&source=gmail-imap&ust=1751029395000000&usg=AOvVaw1-MKRhtyHWHeRQdo6b5RNz
> >>>>>>>> <
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://meet.google.com/ear-kiij-sur%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw1UuJNMFk1OBP1M37cak4Jy&source=gmail-imap&ust=1751029395000000&usg=AOvVaw1-MKRhtyHWHeRQdo6b5RNz
> >>>>>>>> Or dial: ‪(US) +1 402-410-2280‬ PIN: ‪350 919 847‬#
> >>>>>>>> More phone
> >>>>>>>> numbers:
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://tel.meet/ear-kiij-sur?pin%253D5036846369686%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw3C_51S6TE4965AiPLaVe1A&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2UTLLmxN5IXik3klT4a6xc
> >>>>>>>> <
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://tel.meet/ear-kiij-sur?pin%253D5036846369686%26source%3Dgmail-imap%26ust%3D1750848131000000%26usg%3DAOvVaw3C_51S6TE4965AiPLaVe1A&source=gmail-imap&ust=1751029395000000&usg=AOvVaw2UTLLmxN5IXik3klT4a6xc
> >>>
>

Re: Polaris Community Sync on Events

Reply via email to