I was going to make the same suggestion :) For deployments where the complexity of an extra buffer doesn’t make sense, a blocking persistence call will make things easier for the services owner. If read events are added, a high throughput service will need the buffering probably. Though , I would be surprised if those events end up being served to callers for the /events API. For read events, I think something like OpenLineage event reporting is a better option and most callers will want to know about modifications, so filtering all the reads to find the valuable changes is a headache I would guess most service owners would rather avoid.
Mike On Thu, May 22, 2025 at 12:27 AM Adnan Hemani <adnan.hem...@snowflake.com.invalid> wrote: > I haven’t thought of this in depth honestly - but I could see this being > the case. > > -Adnan > > > On May 21, 2025, at 4:39 PM, Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > > > > -devlist > > > > If we design by interface properly, it should be relatively easy to offer > > both a disk buffering and an always-write implementation right? > > > > On Thu, May 22, 2025 at 12:12 AM Adnan Hemani > > <adnan.hem...@snowflake.com.invalid> wrote: > > > >> Hi all, > >> > >> Thanks for sharing these thoughts. I’m also not completely sure about > how > >> much we should care about how much slower things will be if we just > make a > >> trip to the persistence on every write action. However, I’m building > this > >> feature with the intention of being able to also support read event > types > >> in the near future, if this is something that the customer is > interested in > >> enabling using the `CustomOperation` type that is defined in the Events > API > >> spec. Of course, this would need to be configured by the administrator, > as > >> maintenance of the persistence is their responsibility. > >> > >> Given that the Iceberg Events API spec has not yet merged and can still > >> see some changes, I’m planning to begin work on the disk buffering now > and > >> wait for the Events API to finalize before working on the API side of > the > >> end-to-end implementation. > >> > >> Best, > >> Adnan Hemani > >> > >>> On May 20, 2025, at 10:30 PM, Michael Collado <collado.m...@gmail.com> > >> wrote: > >>> > >>> That’s super interesting. Glad this is being worked on. Personally, I > >> don’t > >>> know that the latency for writing events to a persistent storage is > >> really > >>> all that concerning. Looking at the enum of supported operations, only > >>> write operations seem to trigger the event. It’s not like every read > >>> request issues a new event. Given that the request latency here is > >>> dominated by cloud storage calls, do we really care about one extra > call > >> to > >>> Postgres? Personally, I’d skip the extra complexity of a buffer of any > >> kind > >>> and just write straight to the persistence store. > >>> > >>> Mike > >>> > >>> On Tue, May 20, 2025 at 9:31 AM Yufei Gu <flyrain...@gmail.com > <mailto: > >> flyrain...@gmail.com>> wrote: > >>> > >>>> Looks awesome. Thanks for taking the lead! It makes sense to use a > >>>> JDBC-backed persistence layer, shared or separate. The optional > >> retention > >>>> period is a nice safeguard. > >>>> I don’t see any blockers on my side. If no one raises major concerns > >> this > >>>> week, please go ahead and start the implementation. Exciting to see > this > >>>> coming together! > >>>> > >>>> Yufei > >>>> > >>>> > >>>> On Tue, May 13, 2025 at 6:37 PM Adnan Hemani > >>>> <adnan.hem...@snowflake.com.invalid> wrote: > >>>> > >>>>> Hi all, > >>>>> > >>>>> I am raising a proposal to implement the proposed Iceberg REST > >>>>> specification for the Events API (doc < > >>>>> > >>>> > >> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?pli%253D1%2526tab%253Dt.0%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw0UNQYKbXoQ2YHVM7J0kB3l&source=gmail-imap&ust=1748475613000000&usg=AOvVaw38VwFquueiVwCdlo_xR2rB > >>>>> , > >>>>> GH < > >> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/iceberg/pull/12584/files%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw1AvwLK402voAm_j6zy25Mn&source=gmail-imap&ust=1748475613000000&usg=AOvVaw2IwxUir8jCegcKC47zv1Si > >). > >> It is my > >>>>> understanding that this proposal is close and that we will be > required > >> to > >>>>> implement something very close to the current proposal in the near > >>>> future. > >>>>> > >>>>> If Polaris is to implement this API, it will likely need to be > through > >> a > >>>>> Persistence instance that the Polaris server can query instantly, as > >> this > >>>>> API will not be asynchronous. Please note, this proposal is not to > >>>> comment > >>>>> on what events we may emit today or in the future - the scope of this > >>>>> proposal is solely to discuss how we plan to implement the proposed > >>>> Events > >>>>> API. > >>>>> > >>>>> Changes to be made: > >>>>> > >>>>> Implement Event storage through the Polaris Persistence layer > >>>>> > >>>>> We will store events in a persistence instance of user’s choice - > >> whether > >>>>> they would like the events to be part of the same persistence > instance > >> as > >>>>> their Polaris metadata or if they would like for a separate > persistence > >>>>> instance. Users will provide the persistence instance by configuring > a > >>>> JDBC > >>>>> string on Polaris startup, similarly to the JDBC string we receive > >>>>> currently from users for the Polaris metadata. > >>>>> > >>>>> For concerns regarding scaling of events in the Polaris persistence > >>>> layer, > >>>>> we can also implement a recommended, optional parameter for an events > >>>>> retention period after which Polaris will asynchronously delete > records > >>>>> older than that time period. > >>>>> > >>>>> How to Implement Writes to the Polaris Persistence layer > >>>>> > >>>>> The way to implement the above proposal would be through > implementation > >>>> of > >>>>> the `PolarisEventListener` < > >>>>> > >>>> > >> > https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/events/PolarisEventListener.java%26source%3Dgmail-imap%26ust%3D1748410327000000%26usg%3DAOvVaw0Z-SY-d50YHPNK38KxhHVk&source=gmail-imap&ust=1748475613000000&usg=AOvVaw0AVoXWSZgL3qK8GPbwYG9G > >>>>> > >>>>> abstract class. In this implementation, I believe it should not be > >>>>> controversial to state that we cannot block on events to be flushed > to > >>>>> persistence due to latency concerns - and as a result, we have two > >>>> options: > >>>>> 1) a simple in-memory buffer or 2) a file-based buffer. Both buffers > >>>> would > >>>>> flush after a certain amount of time after the first non-flushed > event > >> is > >>>>> written. While option 2 offers a better event durability guarantee in > >>>> case > >>>>> of disaster recovery, it will come at the cost of additional latency > to > >>>>> write to the filesystem. If there are no security concerns regarding > >>>>> writing to the filesystem, I believe this is the recommended way to > >>>>> implement - the additional latency to write to filesystem should not > >> add > >>>>> unreasonable overhead given the right implementation with open > >>>> filewriters. > >>>>> If writing to the filesystem is not recommended, I’m not sure there > is > >>>> any > >>>>> other way to achieve guaranteed event durability. In both options we > >> can > >>>>> only achieve eventual consistency - to get strong consistency, we > will > >>>> need > >>>>> to implement a way to block the API call until we flush the events to > >>>>> persistence, which I cannot recommend at this time due to latency > >>>> concerns. > >>>>> > >>>>> Please reply to this thread if there are any questions and/or > concerns > >> on > >>>>> this proposal. If there are no major concerns within a week, then I > >> will > >>>>> begin implementation. > >>>>> > >>>>> Best, > >>>>> Adnan Hemani > >> > >> > >