Hi all, I am raising a proposal to implement the proposed Iceberg REST specification for the Events API (doc <https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli=1&tab=t.0>, GH <https://github.com/apache/iceberg/pull/12584/files>). It is my understanding that this proposal is close and that we will be required to implement something very close to the current proposal in the near future.
If Polaris is to implement this API, it will likely need to be through a Persistence instance that the Polaris server can query instantly, as this API will not be asynchronous. Please note, this proposal is not to comment on what events we may emit today or in the future - the scope of this proposal is solely to discuss how we plan to implement the proposed Events API. Changes to be made: Implement Event storage through the Polaris Persistence layer We will store events in a persistence instance of user’s choice - whether they would like the events to be part of the same persistence instance as their Polaris metadata or if they would like for a separate persistence instance. Users will provide the persistence instance by configuring a JDBC string on Polaris startup, similarly to the JDBC string we receive currently from users for the Polaris metadata. For concerns regarding scaling of events in the Polaris persistence layer, we can also implement a recommended, optional parameter for an events retention period after which Polaris will asynchronously delete records older than that time period. How to Implement Writes to the Polaris Persistence layer The way to implement the above proposal would be through implementation of the `PolarisEventListener` <https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java> abstract class. In this implementation, I believe it should not be controversial to state that we cannot block on events to be flushed to persistence due to latency concerns - and as a result, we have two options: 1) a simple in-memory buffer or 2) a file-based buffer. Both buffers would flush after a certain amount of time after the first non-flushed event is written. While option 2 offers a better event durability guarantee in case of disaster recovery, it will come at the cost of additional latency to write to the filesystem. If there are no security concerns regarding writing to the filesystem, I believe this is the recommended way to implement - the additional latency to write to filesystem should not add unreasonable overhead given the right implementation with open filewriters. If writing to the filesystem is not recommended, I’m not sure there is any other way to achieve guaranteed event durability. In both options we can only achieve eventual consistency - to get strong consistency, we will need to implement a way to block the API call until we flush the events to persistence, which I cannot recommend at this time due to latency concerns. Please reply to this thread if there are any questions and/or concerns on this proposal. If there are no major concerns within a week, then I will begin implementation. Best, Adnan Hemani