That’s super interesting. Glad this is being worked on. Personally, I don’t
know that the latency for writing events to a persistent storage is really
all that concerning. Looking at the enum of supported operations, only
write operations seem to trigger the event. It’s not like every read
request issues a new event. Given that the request latency here is
dominated by cloud storage calls, do we really care about one extra call to
Postgres? Personally, I’d skip the extra complexity of a buffer of any kind
and just write straight to the persistence store.

Mike

On Tue, May 20, 2025 at 9:31 AM Yufei Gu <flyrain...@gmail.com> wrote:

> Looks awesome. Thanks for taking the lead! It makes sense to use a
> JDBC-backed persistence layer, shared or separate. The optional retention
> period is a nice safeguard.
> I don’t see any blockers on my side. If no one raises major concerns this
> week, please go ahead and start the implementation. Exciting to see this
> coming together!
>
> Yufei
>
>
> On Tue, May 13, 2025 at 6:37 PM Adnan Hemani
> <adnan.hem...@snowflake.com.invalid> wrote:
>
> > Hi all,
> >
> > I am raising a proposal to implement the proposed Iceberg REST
> > specification for the Events API (doc <
> >
> https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli=1&tab=t.0
> >,
> > GH <https://github.com/apache/iceberg/pull/12584/files>). It is my
> > understanding that this proposal is close and that we will be required to
> > implement something very close to the current proposal in the near
> future.
> >
> > If Polaris is to implement this API, it will likely need to be through a
> > Persistence instance that the Polaris server can query instantly, as this
> > API will not be asynchronous. Please note, this proposal is not to
> comment
> > on what events we may emit today or in the future - the scope of this
> > proposal is solely to discuss how we plan to implement the proposed
> Events
> > API.
> >
> > Changes to be made:
> >
> > Implement Event storage through the Polaris Persistence layer
> >
> > We will store events in a persistence instance of user’s choice - whether
> > they would like the events to be part of the same persistence instance as
> > their Polaris metadata or if they would like for a separate persistence
> > instance. Users will provide the persistence instance by configuring a
> JDBC
> > string on Polaris startup, similarly to the JDBC string we receive
> > currently from users for the Polaris metadata.
> >
> > For concerns regarding scaling of events in the Polaris persistence
> layer,
> > we can also implement a recommended, optional parameter for an events
> > retention period after which Polaris will asynchronously delete records
> > older than that time period.
> >
> > How to Implement Writes to the Polaris Persistence layer
> >
> > The way to implement the above proposal would be through implementation
> of
> > the `PolarisEventListener` <
> >
> https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java
> >
> > abstract class. In this implementation, I believe it should not be
> > controversial to state that we cannot block on events to be flushed to
> > persistence due to latency concerns - and as a result, we have two
> options:
> > 1) a simple in-memory buffer or 2) a file-based buffer. Both buffers
> would
> > flush after a certain amount of time after the first non-flushed event is
> > written. While option 2 offers a better event durability guarantee in
> case
> > of disaster recovery, it will come at the cost of additional latency to
> > write to the filesystem. If there are no security concerns regarding
> > writing to the filesystem, I believe this is the recommended way to
> > implement - the additional latency to write to filesystem should not add
> > unreasonable overhead given the right implementation with open
> filewriters.
> > If writing to the filesystem is not recommended, I’m not sure there is
> any
> > other way to achieve guaranteed event durability. In both options we can
> > only achieve eventual consistency - to get strong consistency, we will
> need
> > to implement a way to block the API call until we flush the events to
> > persistence, which I cannot recommend at this time due to latency
> concerns.
> >
> > Please reply to this thread if there are any questions and/or concerns on
> > this proposal. If there are no major concerns within a week, then I will
> > begin implementation.
> >
> > Best,
> > Adnan Hemani
> >
> >
>

Reply via email to