Hi all,

I am raising a proposal to implement the proposed Iceberg REST specification 
for the Events API (doc 
<https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli=1&tab=t.0>,
 GH <https://github.com/apache/iceberg/pull/12584/files>). It is my 
understanding that this proposal is close and that we will be required to 
implement something very close to the current proposal in the near future.

If Polaris is to implement this API, it will likely need to be through a 
Persistence instance that the Polaris server can query instantly, as this API 
will not be asynchronous. Please note, this proposal is not to comment on what 
events we may emit today or in the future - the scope of this proposal is 
solely to discuss how we plan to implement the proposed Events API.

Changes to be made:

Implement Event storage through the Polaris Persistence layer

We will store events in a persistence instance of user’s choice - whether they 
would like the events to be part of the same persistence instance as their 
Polaris metadata or if they would like for a separate persistence instance. 
Users will provide the persistence instance by configuring a JDBC string on 
Polaris startup, similarly to the JDBC string we receive currently from users 
for the Polaris metadata.

For concerns regarding scaling of events in the Polaris persistence layer, we 
can also implement a recommended, optional parameter for an events retention 
period after which Polaris will asynchronously delete records older than that 
time period.

How to Implement Writes to the Polaris Persistence layer

The way to implement the above proposal would be through implementation of the 
`PolarisEventListener` 
<https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java>
 abstract class. In this implementation, I believe it should not be 
controversial to state that we cannot block on events to be flushed to 
persistence due to latency concerns - and as a result, we have two options: 1) 
a simple in-memory buffer or 2) a file-based buffer. Both buffers would flush 
after a certain amount of time after the first non-flushed event is written. 
While option 2 offers a better event durability guarantee in case of disaster 
recovery, it will come at the cost of additional latency to write to the 
filesystem. If there are no security concerns regarding writing to the 
filesystem, I believe this is the recommended way to implement - the additional 
latency to write to filesystem should not add unreasonable overhead given the 
right implementation with open filewriters. If writing to the filesystem is not 
recommended, I’m not sure there is any other way to achieve guaranteed event 
durability. In both options we can only achieve eventual consistency - to get 
strong consistency, we will need to implement a way to block the API call until 
we flush the events to persistence, which I cannot recommend at this time due 
to latency concerns.

Please reply to this thread if there are any questions and/or concerns on this 
proposal. If there are no major concerns within a week, then I will begin 
implementation.

Best,
Adnan Hemani

Reply via email to