Hi EJ, You make good points about the metrics API extensibility and evolution.
However, we need to consider practical aspects too. Anand appears to have some specific use cases in mind, and I assume his proposal addresses them. Starting with an API + implementation that works for some real world applications will validate the feature's usability. We can revamp the API completely in its v2 after v1 is merged. New major API versions do not have to be backward-compatible with older versions of the same API [1]. In my personal experience, a v1 API can hardly be expected to cover all use cases and extensions well. We can certainly take a bit more time to polish it, but from my POV it might be best to iterate in terms of API versions rather than on unmerged commits in the initial proposal. Just my 2 cents :) That said, we should flag the new APIs in this proposal as "beta"... at least initially (which is the usual practice in Polaris). > I wonder if it would help to evaluate the Events API and Metrics API a bit more independently. That makes sense to me. However, the current proposal progressed a lot since its initial submission and contained both APIs. I would not want to lose this momentum. It might still be advisable to implement the events and metrics APIs separately and gather additional feedback at that time. [1] https://polaris.apache.org/in-dev/unreleased/evolution/ Cheers, Dmitri. On Mon, Mar 9, 2026 at 3:48 PM EJ Wang <[email protected]> wrote: > Hi Anand, > > I think the proposal is moving in a better direction, especially on the > Events side, and I appreciate the iteration so far. That said, I still have > some concerns about the Metrics side, but they are less about individual > parameters or endpoint shape, and more about product boundary. > > 2 cents: I wonder if it would help to evaluate the Events API and Metrics > API a bit more independently. > > The Events side feels relatively close to Polaris' catalog/change-log > scope. It is easier to justify as part of the core/community surface, > especially if the goal is to expose completed catalog mutations in a way > that aligns with Iceberg-style events. > > The Metrics side feels different to me. Once we start adding more and more > type-specific filters, query semantics, and schema shape for individual > metric families, it seems easy for Polaris to drift toward a built-in > observability backend. My bias would be for Polaris to support a smaller > set of community-recognized built-in metrics well, while providing good > extensibility points for deployments that want richer querying, > visualization, or use-case-specific metrics. > > Related to that, I am not yet convinced the current metrics model is > generic enough as a long-term direction. Even after consolidating to a > single endpoint, the design still feels fairly tied to the current > scan/commit shape. I worry that otherwise each new metric family will keep > pulling us into more storage/schema/API reshaping inside Polaris core. > So the framing question I would suggest is something like: > > What is the minimal built-in metrics surface Polaris should own in core, > and where should we instead rely on extensibility / sink-export / > plugin-style integration? > > For me, getting that boundary right matters more than settling every query > parameter detail first. > > -ej > > On Tue, Mar 3, 2026 at 12:29 PM Anand Kumar Sankaran via dev < > [email protected]> wrote: > > > Hi Yufei and Dmitri, > > > > Here is a proposal for the REST endpoints for metrics and events. > > > > https://github.com/apache/polaris/pull/3924/changes > > > > I did not see any precursors for raising a PR for proposals, so trying > > this. Please let me know what you think. > > > > - > > Anand > > > > From: Anand Kumar Sankaran <[email protected]> > > Date: Monday, March 2, 2026 at 10:25 AM > > To: [email protected] <[email protected]> > > Subject: Re: Polaris Telemetry and Audit Trail > > > > About the REST API, based on my use cases: > > > > > > 1. > > I want to be able to query commit metrics to track files added / removed > > per commit, along with record counts. The ingestion pipeline that writes > > this data is owned by us and we are guaranteed to write this information > > for each write. > > 2. > > I want to be able to query scan metrics for read. I understand clients do > > not fulfill this requirement. > > 3. > > I want to be able to query the events table (events are persisted) - this > > may supersede #2, I am not sure yet. > > > > All this information is in the JDBC based persistence model and is > > persisted in the metastore. I currently don’t have a need to query > > prometheus or open telemetry. I do publish some events to Prometheus and > > they are forwarded to our dashboards elsewhere. > > > > About the CLI utilities, I meant the admin user utilities. In one of the > > earliest drafts of my proposal, Prashant mentioned that the metrics > tables > > can grow indefinitely and that a similar problem exists with the events > > table as well. We discussed that cleaning up of old records from both > > metrics tables and events tables can be done via a CLI utility. > > > > I see that Yufei has covered the discussion about datasources. > > > > - > > Anand > > > > > > > > From: Yufei Gu <[email protected]> > > Date: Friday, February 27, 2026 at 9:54 PM > > To: [email protected] <[email protected]> > > Subject: Re: Polaris Telemetry and Audit Trail > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Report Suspicious< > > > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$ > > > > > > > > > As I mentioned in > > > https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$ > , > > supporting > > multiple data sources is not a trivial change. I would strongly recommend > > starting with a design document to carefully evaluate the architectural > > implications and long term impact. > > > > A REST endpoint to query metrics seems reasonable given the current JDBC > > based persistence model. That said, we may also consider alternative > > storage models. For example, if we later adopt a time series system such > as > > Prometheus to store metrics, the query model and access patterns would be > > fundamentally different. Designing the REST API without considering these > > potential evolutions may limit flexibility. I'd suggest to start with the > > use case. > > > > Yufei > > > > > > On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Anand, > > > > > > Sharing my view... subject to discussion: > > > > > > 1. Adding non-IRC REST API to Polaris is perfectly fine. > > > > > > Figuring out specific endpoint URIs and payloads might require a few > > > roundtrips, so opening a separate thread for that might be best. > > > Contributors commonly create Google Docs for new API proposals too > (they > > > fairly easy to update as the email discussion progresses). > > > > > > There was a suggestion to try Markdown (with PRs) for proposals [1] ... > > > feel free to give it a try if you are comfortable with that. > > > > > > 2. Could you clarify whether you mean end user utilities or admin user > > > utilities? In the latter case those might be more suitable for the > Admin > > > CLI (java) not the Python CLI, IMHO. > > > > > > Why would these utilities be common with events? IMHO, event use cases > > are > > > distinct from scan/commit metrics. > > > > > > 3. I'd prefer separating metrics persistence from MetaStore persistence > > at > > > the code level, so that they could be mixed and matched independently. > > The > > > separate datasource question will become a non-issue with that > approach, > > I > > > guess. > > > > > > The rationale for separating scan metrics and metastore persistence is > > that > > > "cascading deletes" between them are hardly ever required. Furthermore, > > the > > > data and query patterns are very different so different technologies > > might > > > be beneficial in each case. > > > > > > [1] > > > https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$ > > > > > > Cheers, > > > Dmitri. > > > > > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev < > > > [email protected]> wrote: > > > > > > > Thanks all. This PR is merged now. > > > > > > > > Here are the follow-up features / work needed. These were all part > of > > > the > > > > merged PR at some point in time and were removed to reduce scope. > > > > > > > > Please let me know what you think. > > > > > > > > > > > > 1. A REST API to paginate through table metrics. This will be > > non-IRC > > > > standard addition. > > > > 2. Utilities for managing old records, should be common with > events. > > > > There was some discussion that it belongs to the CLI. > > > > 3. Separate datasource (metrics, events, even other tables?). > > > > > > > > > > > > Anything else? > > > > > > > > - > > > > Anand > > > > > > > > > > > > > > > >
