Hi Anand,

Thanks for starting this proposal. I see that the linked GH PR gathered
some review comments. Let's continue evolving it!

>From my personal POV the review on the .md doc PR is pretty easy and
generally resembles regular code reviews in GH, which is a good thing as
most people here are familiar with that.

For those interested, the ASF mailing lists archive this review at [1].
That trail is a bit verbose because it shows very long diffs, but it's
still readable.

In terms of process, I suppose once the PR is approved, we will merge it
and then proceed with implementation PRs. WDYT?

[1] https://lists.apache.org/thread/1s28pv4qsmjc6m3lypzmfkgrdom82fjd

Cheers,
Dmitri.

On Tue, Mar 3, 2026 at 3:29 PM Anand Kumar Sankaran via dev <
[email protected]> wrote:

> Hi Yufei and Dmitri,
>
> Here is a proposal for the REST endpoints for metrics and events.
>
> https://github.com/apache/polaris/pull/3924/changes
>
> I did not see any precursors for raising a PR for proposals, so trying
> this.  Please let me know what you think.
>
> -
> Anand
>
> From: Anand Kumar Sankaran <[email protected]>
> Date: Monday, March 2, 2026 at 10:25 AM
> To: [email protected] <[email protected]>
> Subject: Re: Polaris Telemetry and Audit Trail
>
> About the REST API, based on my use cases:
>
>
>   1.
> I want to be able to query commit metrics to track files added / removed
> per commit, along with record counts. The ingestion pipeline that writes
> this data is owned by us and we are guaranteed to write this information
> for each write.
>   2.
> I want to be able to query scan metrics for read. I understand clients do
> not fulfill this requirement.
>   3.
> I want to be able to query the events table (events are persisted) - this
> may supersede #2, I am not sure yet.
>
> All this information is in the JDBC based persistence model and is
> persisted in the metastore. I currently don’t have a need to query
> prometheus or open telemetry. I do publish some events to Prometheus and
> they are forwarded to our dashboards elsewhere.
>
> About the CLI utilities, I meant the admin user utilities. In one of the
> earliest drafts of my proposal, Prashant mentioned that the metrics tables
> can grow indefinitely and that a similar problem exists with the events
> table as well. We discussed that cleaning up of old records from both
> metrics tables and events tables can be done via a CLI utility.
>
> I see that Yufei has covered the discussion about datasources.
>
> -
> Anand
>
>
>
> From: Yufei Gu <[email protected]>
> Date: Friday, February 27, 2026 at 9:54 PM
> To: [email protected] <[email protected]>
> Subject: Re: Polaris Telemetry and Audit Trail
>
> This Message Is From an External Sender
> This message came from outside your organization.
> Report Suspicious<
> https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$
> >
>
>
> As I mentioned in
> https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$,
> supporting
> multiple data sources is not a trivial change. I would strongly recommend
> starting with a design document to carefully evaluate the architectural
> implications and long term impact.
>
> A REST endpoint to query metrics seems reasonable given the current JDBC
> based persistence model. That said, we may also consider alternative
> storage models. For example, if we later adopt a time series system such as
> Prometheus to store metrics, the query model and access patterns would be
> fundamentally different. Designing the REST API without considering these
> potential evolutions may limit flexibility. I'd suggest to start with the
> use case.
>
> Yufei
>
>
> On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi Anand,
> >
> > Sharing my view... subject to discussion:
> >
> > 1. Adding non-IRC REST API to Polaris is perfectly fine.
> >
> > Figuring out specific endpoint URIs and payloads might require a few
> > roundtrips, so opening a separate thread for that might be best.
> > Contributors commonly create Google Docs for new API proposals too (they
> > fairly easy to update as the email discussion progresses).
> >
> > There was a suggestion to try Markdown (with PRs) for proposals [1] ...
> > feel free to give it a try if you are comfortable with that.
> >
> > 2. Could you clarify whether you mean end user utilities or admin user
> > utilities? In the latter case those might be more suitable for the Admin
> > CLI (java) not the Python CLI, IMHO.
> >
> > Why would these utilities be common with events? IMHO, event use cases
> are
> > distinct from scan/commit metrics.
> >
> > 3. I'd prefer separating metrics persistence from MetaStore persistence
> at
> > the code level, so that they could be mixed and matched independently.
> The
> > separate datasource question will become a non-issue with that approach,
> I
> > guess.
> >
> > The rationale for separating scan metrics and metastore persistence is
> that
> > "cascading deletes" between them are hardly ever required. Furthermore,
> the
> > data and query patterns are very different so different technologies
> might
> > be beneficial in each case.
> >
> > [1]
> https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$
> >
> > Cheers,
> > Dmitri.
> >
> > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev <
> > [email protected]> wrote:
> >
> > > Thanks all. This PR is merged now.
> > >
> > > Here are the follow-up features / work needed.  These were all part of
> > the
> > > merged PR at some point in time and were removed to reduce scope.
> > >
> > > Please let me know what you think.
> > >
> > >
> > >   1.  A REST API to paginate through table metrics. This will be
> non-IRC
> > > standard addition.
> > >   2.  Utilities for managing old records, should be common with events.
> > > There was some discussion that it belongs to the CLI.
> > >   3.  Separate datasource (metrics, events, even other tables?).
> > >
> > >
> > > Anything else?
> > >
> > > -
> > > Anand
> > >
> > >
> >
>
>

Reply via email to