Hi Yufei, Can you please take a look at the proposal? A few folks have reviewed it already. Since there are comments on this already, I would like to continue the markdown approach. Once you are done reviewing, I will be happy to summarize it as a google doc if necessary.
As discussed in the other thread, even if I don’t rely on table metrics and just rely on load_table event - since the load_table event is persisted in the events table in the metastore (I wrote a custom event listener for persisting events), I need a REST API to get the persisted data. - Anand From: Yufei Gu <[email protected]> Date: Tuesday, March 3, 2026 at 4:10 PM To: [email protected] <[email protected]> Subject: Re: Proposal for REST endpoints for table metrics and events This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B6hvBdYYm838FDmImuiAUSE78VzyujyOT0ZyDxRKpnXdfCrIfG-isPgAO__U2Oeg5gxuWjVLpaiLfM99fLtXl_fey3sCb_5Woe4NzQK0ywvUVhgSIiPME$> Thanks for the proposal, Anand. I will take a look. I generally prefer Google Docs for design doc, esp. for the early design, which usually needs fast iterations. But I'm OK with a markdown. BTW, we haven't concluded the dev thread on the proposal, I think we will need a conclusion that, so that people wont' be confused. Yufei On Tue, Mar 3, 2026 at 4:03 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Anand, > > Thanks for starting this proposal. I see that the linked GH PR gathered > some review comments. Let's continue evolving it! > > From my personal POV the review on the .md doc PR is pretty easy and > generally resembles regular code reviews in GH, which is a good thing as > most people here are familiar with that. > > For those interested, the ASF mailing lists archive this review at [1]. > That trail is a bit verbose because it shows very long diffs, but it's > still readable. > > In terms of process, I suppose once the PR is approved, we will merge it > and then proceed with implementation PRs. WDYT? > > [1] > https://urldefense.com/v3/__https://lists.apache.org/thread/1s28pv4qsmjc6m3lypzmfkgrdom82fjd__;!!Iz9xO38YGHZK!-kXZ-ywdNcPgtaWa6yDyeRKdwv4JKFhVgLAU9RvdKW8rpUYbc0vQj4wpgc3GIpIkNdDRujBzFn5vGBetjBwta-g$ > > Cheers, > Dmitri. > > On Tue, Mar 3, 2026 at 3:29 PM Anand Kumar Sankaran via dev < > [email protected]> wrote: > > > Hi Yufei and Dmitri, > > > > Here is a proposal for the REST endpoints for metrics and events. > > > > https://urldefense.com/v3/__https://github.com/apache/polaris/pull/3924/changes__;!!Iz9xO38YGHZK!-kXZ-ywdNcPgtaWa6yDyeRKdwv4JKFhVgLAU9RvdKW8rpUYbc0vQj4wpgc3GIpIkNdDRujBzFn5vGBetaQ8TVgY$ > > > > I did not see any precursors for raising a PR for proposals, so trying > > this. Please let me know what you think. > > > > - > > Anand > > > > From: Anand Kumar Sankaran <[email protected]> > > Date: Monday, March 2, 2026 at 10:25 AM > > To: [email protected] <[email protected]> > > Subject: Re: Polaris Telemetry and Audit Trail > > > > About the REST API, based on my use cases: > > > > > > 1. > > I want to be able to query commit metrics to track files added / removed > > per commit, along with record counts. The ingestion pipeline that writes > > this data is owned by us and we are guaranteed to write this information > > for each write. > > 2. > > I want to be able to query scan metrics for read. I understand clients do > > not fulfill this requirement. > > 3. > > I want to be able to query the events table (events are persisted) - this > > may supersede #2, I am not sure yet. > > > > All this information is in the JDBC based persistence model and is > > persisted in the metastore. I currently don’t have a need to query > > prometheus or open telemetry. I do publish some events to Prometheus and > > they are forwarded to our dashboards elsewhere. > > > > About the CLI utilities, I meant the admin user utilities. In one of the > > earliest drafts of my proposal, Prashant mentioned that the metrics > tables > > can grow indefinitely and that a similar problem exists with the events > > table as well. We discussed that cleaning up of old records from both > > metrics tables and events tables can be done via a CLI utility. > > > > I see that Yufei has covered the discussion about datasources. > > > > - > > Anand > > > > > > > > From: Yufei Gu <[email protected]> > > Date: Friday, February 27, 2026 at 9:54 PM > > To: [email protected] <[email protected]> > > Subject: Re: Polaris Telemetry and Audit Trail > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Report Suspicious< > > > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$ > > > > > > > > > As I mentioned in > > > https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$ >, > > supporting > > multiple data sources is not a trivial change. I would strongly recommend > > starting with a design document to carefully evaluate the architectural > > implications and long term impact. > > > > A REST endpoint to query metrics seems reasonable given the current JDBC > > based persistence model. That said, we may also consider alternative > > storage models. For example, if we later adopt a time series system such > as > > Prometheus to store metrics, the query model and access patterns would be > > fundamentally different. Designing the REST API without considering these > > potential evolutions may limit flexibility. I'd suggest to start with the > > use case. > > > > Yufei > > > > > > On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Anand, > > > > > > Sharing my view... subject to discussion: > > > > > > 1. Adding non-IRC REST API to Polaris is perfectly fine. > > > > > > Figuring out specific endpoint URIs and payloads might require a few > > > roundtrips, so opening a separate thread for that might be best. > > > Contributors commonly create Google Docs for new API proposals too > (they > > > fairly easy to update as the email discussion progresses). > > > > > > There was a suggestion to try Markdown (with PRs) for proposals [1] ... > > > feel free to give it a try if you are comfortable with that. > > > > > > 2. Could you clarify whether you mean end user utilities or admin user > > > utilities? In the latter case those might be more suitable for the > Admin > > > CLI (java) not the Python CLI, IMHO. > > > > > > Why would these utilities be common with events? IMHO, event use cases > > are > > > distinct from scan/commit metrics. > > > > > > 3. I'd prefer separating metrics persistence from MetaStore persistence > > at > > > the code level, so that they could be mixed and matched independently. > > The > > > separate datasource question will become a non-issue with that > approach, > > I > > > guess. > > > > > > The rationale for separating scan metrics and metastore persistence is > > that > > > "cascading deletes" between them are hardly ever required. Furthermore, > > the > > > data and query patterns are very different so different technologies > > might > > > be beneficial in each case. > > > > > > [1] > > > https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$ >> > > > > Cheers, > > > Dmitri. > > > > > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev < > > > [email protected]> wrote: > > > > > > > Thanks all. This PR is merged now. > > > > > > > > Here are the follow-up features / work needed. These were all part > of > > > the > > > > merged PR at some point in time and were removed to reduce scope. > > > > > > > > Please let me know what you think. > > > > > > > > > > > > 1. A REST API to paginate through table metrics. This will be > > non-IRC > > > > standard addition. > > > > 2. Utilities for managing old records, should be common with > events. > > > > There was some discussion that it belongs to the CLI. > > > > 3. Separate datasource (metrics, events, even other tables?). > > > > > > > > > > > > Anything else? > > > > > > > > - > > > > Anand > > > > > > > > > > > > > > > >
