Hi Adnan,

That is a really interesting and important work! Implementing event storage
through persistent backers totally makes sense to me.

For the event buffer, I think it is great to batch the events and reduce
access to persistence. However, I am wondering if durability
during service start is a strong need for catalog events. File based
buffers are slower and introduce extra file management complexity,
usually an in-memory buffer with size/time based flush is good enough to
serve the usage. Or if we step back, as Michael mentioned,
how bad would it be if we do a write to the persistent storage for every
event produced.

To start with, I would also recommend skipping the extra complexity. If we
really want the optimization, would it be possible that we
start with an In-memory buffer with configurable size, so users can adjust
it based on their tolerance. if user set the size to 0, we will
just write to persistent for every event, and users can get the durability
with sacrifice of higher traffic to persistent. We can introduce
more sophisticated buffering approach in the future based on the usages.

Best Regards,
Yun

On Tue, May 20, 2025 at 10:30 PM Michael Collado <collado.m...@gmail.com>
wrote:

> That’s super interesting. Glad this is being worked on. Personally, I don’t
> know that the latency for writing events to a persistent storage is really
> all that concerning. Looking at the enum of supported operations, only
> write operations seem to trigger the event. It’s not like every read
> request issues a new event. Given that the request latency here is
> dominated by cloud storage calls, do we really care about one extra call to
> Postgres? Personally, I’d skip the extra complexity of a buffer of any kind
> and just write straight to the persistence store.
>
> Mike
>
> On Tue, May 20, 2025 at 9:31 AM Yufei Gu <flyrain...@gmail.com> wrote:
>
> > Looks awesome. Thanks for taking the lead! It makes sense to use a
> > JDBC-backed persistence layer, shared or separate. The optional retention
> > period is a nice safeguard.
> > I don’t see any blockers on my side. If no one raises major concerns this
> > week, please go ahead and start the implementation. Exciting to see this
> > coming together!
> >
> > Yufei
> >
> >
> > On Tue, May 13, 2025 at 6:37 PM Adnan Hemani
> > <adnan.hem...@snowflake.com.invalid> wrote:
> >
> > > Hi all,
> > >
> > > I am raising a proposal to implement the proposed Iceberg REST
> > > specification for the Events API (doc <
> > >
> >
> https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli=1&tab=t.0
> > >,
> > > GH <https://github.com/apache/iceberg/pull/12584/files>). It is my
> > > understanding that this proposal is close and that we will be required
> to
> > > implement something very close to the current proposal in the near
> > future.
> > >
> > > If Polaris is to implement this API, it will likely need to be through
> a
> > > Persistence instance that the Polaris server can query instantly, as
> this
> > > API will not be asynchronous. Please note, this proposal is not to
> > comment
> > > on what events we may emit today or in the future - the scope of this
> > > proposal is solely to discuss how we plan to implement the proposed
> > Events
> > > API.
> > >
> > > Changes to be made:
> > >
> > > Implement Event storage through the Polaris Persistence layer
> > >
> > > We will store events in a persistence instance of user’s choice -
> whether
> > > they would like the events to be part of the same persistence instance
> as
> > > their Polaris metadata or if they would like for a separate persistence
> > > instance. Users will provide the persistence instance by configuring a
> > JDBC
> > > string on Polaris startup, similarly to the JDBC string we receive
> > > currently from users for the Polaris metadata.
> > >
> > > For concerns regarding scaling of events in the Polaris persistence
> > layer,
> > > we can also implement a recommended, optional parameter for an events
> > > retention period after which Polaris will asynchronously delete records
> > > older than that time period.
> > >
> > > How to Implement Writes to the Polaris Persistence layer
> > >
> > > The way to implement the above proposal would be through implementation
> > of
> > > the `PolarisEventListener` <
> > >
> >
> https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java
> > >
> > > abstract class. In this implementation, I believe it should not be
> > > controversial to state that we cannot block on events to be flushed to
> > > persistence due to latency concerns - and as a result, we have two
> > options:
> > > 1) a simple in-memory buffer or 2) a file-based buffer. Both buffers
> > would
> > > flush after a certain amount of time after the first non-flushed event
> is
> > > written. While option 2 offers a better event durability guarantee in
> > case
> > > of disaster recovery, it will come at the cost of additional latency to
> > > write to the filesystem. If there are no security concerns regarding
> > > writing to the filesystem, I believe this is the recommended way to
> > > implement - the additional latency to write to filesystem should not
> add
> > > unreasonable overhead given the right implementation with open
> > filewriters.
> > > If writing to the filesystem is not recommended, I’m not sure there is
> > any
> > > other way to achieve guaranteed event durability. In both options we
> can
> > > only achieve eventual consistency - to get strong consistency, we will
> > need
> > > to implement a way to block the API call until we flush the events to
> > > persistence, which I cannot recommend at this time due to latency
> > concerns.
> > >
> > > Please reply to this thread if there are any questions and/or concerns
> on
> > > this proposal. If there are no major concerns within a week, then I
> will
> > > begin implementation.
> > >
> > > Best,
> > > Adnan Hemani
> > >
> > >
> >
>

Reply via email to