Persisting only the update event is likely insufficient. What we really care about here is the snapshot summary, which could appear as part of the update event payload. However, it isn't clear whether event persistence should always include the full payload ATM. If the payload becomes large, it could cause the event table to grow very quickly. Snapshot summaries and events also serve different purposes. Snapshot summaries behave more like table metrics. For example, when users want to understand trends such as table total storage size or number of rows over time, they typically care about the metrics rather than the events themselves. With that in mind, one approach is to persist the snapshot summary as a type of metric, while allowing events to decide independently whether to persist their payload. This would likely require making the metrics persistence model more general so it can support different kinds of metrics.
Yufei On Sat, Mar 7, 2026 at 6:49 PM Anand Kumar Sankaran via dev < [email protected]> wrote: > Hi Yufei, > > If I understand you correctly, persisting the table_update event a > querying it is sufficient? I need the REST API for querying events at > least. > > From: Yufei Gu <[email protected]> > Date: Tuesday, March 3, 2026 at 11:58 AM > To: [email protected] <[email protected]> > Subject: Re: Polaris Telemetry and Audit Trail > > This Message Is From an External Sender > This message came from outside your organization. > Report Suspicious< > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B6h3NWydMnvDefAVhkBMCV4BSvMNI3goQW1uLJNEoJnS5coa77cngCWnC3sb-Hb42HlfhGHZ-uRj5S4JsOlj7IssK7yvqxN_n-k9CTPapLc7vIm8QxiJX$ > > > > > > I want to be able to query commit metrics to track files added / removed > per commit, along with record counts. The ingestion pipeline that writes > this data is owned by us and we are guaranteed to write this information > for each write. > > I think the snapshot summary would be more reliable for tracking them. The > problem with the commit report metrics is that they are not necessarily > committed. A commit failure can easily happen after the commit metrics > report is sent to Polaris. The snapshot summary doesn't have this issue. A > valid snapshot is a successful commit. To get them, users only need to load > the table. One minor problem with snapshot summaries is that the snapshot > itself could expire over time. To mitigate it, either users can persist > them by leveraging table update event or Polaris can provide a solution to > host the historical snapshot summary. > > Yufei > > > On Tue, Mar 3, 2026 at 9:47 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi Anand, > > > > Your proposal for the new REST API to query scan/commit metrics makes > sense > > to me. > > > > My only request is to start a dedicated "[DISCUSS]" thread for it, just > for > > the sake of clarity since this thread is already pretty long, and all the > > related PRs previously mentioned here have been merged. WDYT? > > > > Whether to create an accompanying Google Doc for the new REST API is up > to > > you. > > > > From my personal POV, a new email thread plus a GH PR with the API spec > > would be a good start. > > > > Thanks, > > Dmitri. > > > > On Mon, Mar 2, 2026 at 1:26 PM Anand Kumar Sankaran via dev < > > [email protected]> wrote: > > > > > About the REST API, based on my use cases: > > > > > > > > > 1. > > > I want to be able to query commit metrics to track files added / > removed > > > per commit, along with record counts. The ingestion pipeline that > writes > > > this data is owned by us and we are guaranteed to write this > information > > > for each write. > > > 2. > > > I want to be able to query scan metrics for read. I understand clients > do > > > not fulfill this requirement. > > > 3. > > > I want to be able to query the events table (events are persisted) - > this > > > may supersede #2, I am not sure yet. > > > > > > All this information is in the JDBC based persistence model and is > > > persisted in the metastore. I currently don’t have a need to query > > > prometheus or open telemetry. I do publish some events to Prometheus > and > > > they are forwarded to our dashboards elsewhere. > > > > > > About the CLI utilities, I meant the admin user utilities. In one of > the > > > earliest drafts of my proposal, Prashant mentioned that the metrics > > tables > > > can grow indefinitely and that a similar problem exists with the events > > > table as well. We discussed that cleaning up of old records from both > > > metrics tables and events tables can be done via a CLI utility. > > > > > > I see that Yufei has covered the discussion about datasources. > > > > > > - > > > Anand > > > > > > > > > > > > From: Yufei Gu <[email protected]> > > > Date: Friday, February 27, 2026 at 9:54 PM > > > To: [email protected] <[email protected]> > > > Subject: Re: Polaris Telemetry and Audit Trail > > > > > > This Message Is From an External Sender > > > This message came from outside your organization. > > > Report Suspicious< > > > > > > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$ > > > > > > > > > > > > > As I mentioned in > > > > > > https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$ > >, > > > supporting > > > multiple data sources is not a trivial change. I would strongly > recommend > > > starting with a design document to carefully evaluate the architectural > > > implications and long term impact. > > > > > > A REST endpoint to query metrics seems reasonable given the current > JDBC > > > based persistence model. That said, we may also consider alternative > > > storage models. For example, if we later adopt a time series system > such > > as > > > Prometheus to store metrics, the query model and access patterns would > be > > > fundamentally different. Designing the REST API without considering > these > > > potential evolutions may limit flexibility. I'd suggest to start with > the > > > use case. > > > > > > Yufei > > > > > > > > > On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]> > > > wrote: > > > > > > > Hi Anand, > > > > > > > > Sharing my view... subject to discussion: > > > > > > > > 1. Adding non-IRC REST API to Polaris is perfectly fine. > > > > > > > > Figuring out specific endpoint URIs and payloads might require a few > > > > roundtrips, so opening a separate thread for that might be best. > > > > Contributors commonly create Google Docs for new API proposals too > > (they > > > > fairly easy to update as the email discussion progresses). > > > > > > > > There was a suggestion to try Markdown (with PRs) for proposals [1] > ... > > > > feel free to give it a try if you are comfortable with that. > > > > > > > > 2. Could you clarify whether you mean end user utilities or admin > user > > > > utilities? In the latter case those might be more suitable for the > > Admin > > > > CLI (java) not the Python CLI, IMHO. > > > > > > > > Why would these utilities be common with events? IMHO, event use > cases > > > are > > > > distinct from scan/commit metrics. > > > > > > > > 3. I'd prefer separating metrics persistence from MetaStore > persistence > > > at > > > > the code level, so that they could be mixed and matched > independently. > > > The > > > > separate datasource question will become a non-issue with that > > approach, > > > I > > > > guess. > > > > > > > > The rationale for separating scan metrics and metastore persistence > is > > > that > > > > "cascading deletes" between them are hardly ever required. > Furthermore, > > > the > > > > data and query patterns are very different so different technologies > > > might > > > > be beneficial in each case. > > > > > > > > [1] > > > > > > https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$ > >> > > > > > Cheers, > > > > Dmitri. > > > > > > > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev < > > > > [email protected]> wrote: > > > > > > > > > Thanks all. This PR is merged now. > > > > > > > > > > Here are the follow-up features / work needed. These were all part > > of > > > > the > > > > > merged PR at some point in time and were removed to reduce scope. > > > > > > > > > > Please let me know what you think. > > > > > > > > > > > > > > > 1. A REST API to paginate through table metrics. This will be > > > non-IRC > > > > > standard addition. > > > > > 2. Utilities for managing old records, should be common with > > events. > > > > > There was some discussion that it belongs to the CLI. > > > > > 3. Separate datasource (metrics, events, even other tables?). > > > > > > > > > > > > > > > Anything else? > > > > > > > > > > - > > > > > Anand > > > > > > > > > > > > > > > > > > > > > > > >
