I was going to make the same suggestion :) For deployments where the
complexity of an extra buffer doesn’t make sense, a blocking persistence
call will make things easier for the services owner. If read events are
added, a high throughput service will need the buffering probably. Though ,
I would be surprised if those events end up being served to callers for the
/events API.  For read events, I think something like OpenLineage event
reporting is a better option and most callers will want to know about
modifications, so filtering all the reads to find the valuable changes is a
headache I would guess most service owners would rather avoid.

Mike

On Thu, May 22, 2025 at 12:27 AM Adnan Hemani
<adnan.hem...@snowflake.com.invalid> wrote:

> I haven’t thought of this in depth honestly - but I could see this being
> the case.
>
> -Adnan
>
> > On May 21, 2025, at 4:39 PM, Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
> >
> > -devlist
> >
> > If we design by interface properly, it should be relatively easy to offer
> > both a disk buffering and an always-write implementation right?
> >
> > On Thu, May 22, 2025 at 12:12 AM Adnan Hemani
> > <adnan.hem...@snowflake.com.invalid> wrote:
> >
> >> Hi all,
> >>
> >> Thanks for sharing these thoughts. I’m also not completely sure about
> how
> >> much we should care about how much slower things will be if we just
> make a
> >> trip to the persistence on every write action. However, I’m building
> this
> >> feature with the intention of being able to also support read event
> types
> >> in the near future, if this is something that the customer is
> interested in
> >> enabling using the `CustomOperation` type that is defined in the Events
> API
> >> spec. Of course, this would need to be configured by the administrator,
> as
> >> maintenance of the persistence is their responsibility.
> >>
> >> Given that the Iceberg Events API spec has not yet merged and can still
> >> see some changes, I’m planning to begin work on the disk buffering now
> and
> >> wait for the Events API to finalize before working on the API side of
> the
> >> end-to-end implementation.
> >>
> >> Best,
> >> Adnan Hemani
> >>
> >>> On May 20, 2025, at 10:30 PM, Michael Collado <collado.m...@gmail.com>
> >> wrote:
> >>>
> >>> That’s super interesting. Glad this is being worked on. Personally, I
> >> don’t
> >>> know that the latency for writing events to a persistent storage is
> >> really
> >>> all that concerning. Looking at the enum of supported operations, only
> >>> write operations seem to trigger the event. It’s not like every read
> >>> request issues a new event. Given that the request latency here is
> >>> dominated by cloud storage calls, do we really care about one extra
> call
> >> to
> >>> Postgres? Personally, I’d skip the extra complexity of a buffer of any
> >> kind
> >>> and just write straight to the persistence store.
> >>>
> >>> Mike
> >>>
> >>> On Tue, May 20, 2025 at 9:31 AM Yufei Gu <flyrain...@gmail.com
> <mailto:
> >> flyrain...@gmail.com>> wrote:
> >>>
> >>>> Looks awesome. Thanks for taking the lead! It makes sense to use a
> >>>> JDBC-backed persistence layer, shared or separate. The optional
> >> retention
> >>>> period is a nice safeguard.
> >>>> I don’t see any blockers on my side. If no one raises major concerns
> >> this
> >>>> week, please go ahead and start the implementation. Exciting to see
> this
> >>>> coming together!
> >>>>
> >>>> Yufei
> >>>>
> >>>>
> >>>> On Tue, May 13, 2025 at 6:37 PM Adnan Hemani
> >>>> <adnan.hem...@snowflake.com.invalid> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I am raising a proposal to implement the proposed Iceberg REST
> >>>>> specification for the Events API (doc <
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli%253D1%2526tab%253Dt.0%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw0UNQYKbXoQ2YHVM7J0kB3l&source=gmail-imap&ust=1748475613000000&usg=AOvVaw38VwFquueiVwCdlo_xR2rB
> >>>>> ,
> >>>>> GH <
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/iceberg/pull/12584/files%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw1AvwLK402voAm_j6zy25Mn&source=gmail-imap&ust=1748475613000000&usg=AOvVaw2IwxUir8jCegcKC47zv1Si
> >).
> >> It is my
> >>>>> understanding that this proposal is close and that we will be
> required
> >> to
> >>>>> implement something very close to the current proposal in the near
> >>>> future.
> >>>>>
> >>>>> If Polaris is to implement this API, it will likely need to be
> through
> >> a
> >>>>> Persistence instance that the Polaris server can query instantly, as
> >> this
> >>>>> API will not be asynchronous. Please note, this proposal is not to
> >>>> comment
> >>>>> on what events we may emit today or in the future - the scope of this
> >>>>> proposal is solely to discuss how we plan to implement the proposed
> >>>> Events
> >>>>> API.
> >>>>>
> >>>>> Changes to be made:
> >>>>>
> >>>>> Implement Event storage through the Polaris Persistence layer
> >>>>>
> >>>>> We will store events in a persistence instance of user’s choice -
> >> whether
> >>>>> they would like the events to be part of the same persistence
> instance
> >> as
> >>>>> their Polaris metadata or if they would like for a separate
> persistence
> >>>>> instance. Users will provide the persistence instance by configuring
> a
> >>>> JDBC
> >>>>> string on Polaris startup, similarly to the JDBC string we receive
> >>>>> currently from users for the Polaris metadata.
> >>>>>
> >>>>> For concerns regarding scaling of events in the Polaris persistence
> >>>> layer,
> >>>>> we can also implement a recommended, optional parameter for an events
> >>>>> retention period after which Polaris will asynchronously delete
> records
> >>>>> older than that time period.
> >>>>>
> >>>>> How to Implement Writes to the Polaris Persistence layer
> >>>>>
> >>>>> The way to implement the above proposal would be through
> implementation
> >>>> of
> >>>>> the `PolarisEventListener` <
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw0Z-SY-d50YHPNK38KxhHVk&source=gmail-imap&ust=1748475613000000&usg=AOvVaw0AVoXWSZgL3qK8GPbwYG9G
> >>>>>
> >>>>> abstract class. In this implementation, I believe it should not be
> >>>>> controversial to state that we cannot block on events to be flushed
> to
> >>>>> persistence due to latency concerns - and as a result, we have two
> >>>> options:
> >>>>> 1) a simple in-memory buffer or 2) a file-based buffer. Both buffers
> >>>> would
> >>>>> flush after a certain amount of time after the first non-flushed
> event
> >> is
> >>>>> written. While option 2 offers a better event durability guarantee in
> >>>> case
> >>>>> of disaster recovery, it will come at the cost of additional latency
> to
> >>>>> write to the filesystem. If there are no security concerns regarding
> >>>>> writing to the filesystem, I believe this is the recommended way to
> >>>>> implement - the additional latency to write to filesystem should not
> >> add
> >>>>> unreasonable overhead given the right implementation with open
> >>>> filewriters.
> >>>>> If writing to the filesystem is not recommended, I’m not sure there
> is
> >>>> any
> >>>>> other way to achieve guaranteed event durability. In both options we
> >> can
> >>>>> only achieve eventual consistency - to get strong consistency, we
> will
> >>>> need
> >>>>> to implement a way to block the API call until we flush the events to
> >>>>> persistence, which I cannot recommend at this time due to latency
> >>>> concerns.
> >>>>>
> >>>>> Please reply to this thread if there are any questions and/or
> concerns
> >> on
> >>>>> this proposal. If there are no major concerns within a week, then I
> >> will
> >>>>> begin implementation.
> >>>>>
> >>>>> Best,
> >>>>> Adnan Hemani
> >>
> >>
>
>

Reply via email to