Thanks for the proposal, Anand. I will take a look. I generally prefer
Google Docs for design doc, esp. for the early design, which usually needs
fast iterations. But I'm OK with a markdown. BTW, we haven't concluded the
dev thread on the proposal, I think we will need a conclusion that, so that
people wont' be confused.

Yufei


On Tue, Mar 3, 2026 at 4:03 PM Dmitri Bourlatchkov <[email protected]> wrote:

> Hi Anand,
>
> Thanks for starting this proposal. I see that the linked GH PR gathered
> some review comments. Let's continue evolving it!
>
> From my personal POV the review on the .md doc PR is pretty easy and
> generally resembles regular code reviews in GH, which is a good thing as
> most people here are familiar with that.
>
> For those interested, the ASF mailing lists archive this review at [1].
> That trail is a bit verbose because it shows very long diffs, but it's
> still readable.
>
> In terms of process, I suppose once the PR is approved, we will merge it
> and then proceed with implementation PRs. WDYT?
>
> [1] https://lists.apache.org/thread/1s28pv4qsmjc6m3lypzmfkgrdom82fjd
>
> Cheers,
> Dmitri.
>
> On Tue, Mar 3, 2026 at 3:29 PM Anand Kumar Sankaran via dev <
> [email protected]> wrote:
>
> > Hi Yufei and Dmitri,
> >
> > Here is a proposal for the REST endpoints for metrics and events.
> >
> > https://github.com/apache/polaris/pull/3924/changes
> >
> > I did not see any precursors for raising a PR for proposals, so trying
> > this.  Please let me know what you think.
> >
> > -
> > Anand
> >
> > From: Anand Kumar Sankaran <[email protected]>
> > Date: Monday, March 2, 2026 at 10:25 AM
> > To: [email protected] <[email protected]>
> > Subject: Re: Polaris Telemetry and Audit Trail
> >
> > About the REST API, based on my use cases:
> >
> >
> >   1.
> > I want to be able to query commit metrics to track files added / removed
> > per commit, along with record counts. The ingestion pipeline that writes
> > this data is owned by us and we are guaranteed to write this information
> > for each write.
> >   2.
> > I want to be able to query scan metrics for read. I understand clients do
> > not fulfill this requirement.
> >   3.
> > I want to be able to query the events table (events are persisted) - this
> > may supersede #2, I am not sure yet.
> >
> > All this information is in the JDBC based persistence model and is
> > persisted in the metastore. I currently don’t have a need to query
> > prometheus or open telemetry. I do publish some events to Prometheus and
> > they are forwarded to our dashboards elsewhere.
> >
> > About the CLI utilities, I meant the admin user utilities. In one of the
> > earliest drafts of my proposal, Prashant mentioned that the metrics
> tables
> > can grow indefinitely and that a similar problem exists with the events
> > table as well. We discussed that cleaning up of old records from both
> > metrics tables and events tables can be done via a CLI utility.
> >
> > I see that Yufei has covered the discussion about datasources.
> >
> > -
> > Anand
> >
> >
> >
> > From: Yufei Gu <[email protected]>
> > Date: Friday, February 27, 2026 at 9:54 PM
> > To: [email protected] <[email protected]>
> > Subject: Re: Polaris Telemetry and Audit Trail
> >
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > Report Suspicious<
> >
> https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$
> > >
> >
> >
> > As I mentioned in
> >
> https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$
> ,
> > supporting
> > multiple data sources is not a trivial change. I would strongly recommend
> > starting with a design document to carefully evaluate the architectural
> > implications and long term impact.
> >
> > A REST endpoint to query metrics seems reasonable given the current JDBC
> > based persistence model. That said, we may also consider alternative
> > storage models. For example, if we later adopt a time series system such
> as
> > Prometheus to store metrics, the query model and access patterns would be
> > fundamentally different. Designing the REST API without considering these
> > potential evolutions may limit flexibility. I'd suggest to start with the
> > use case.
> >
> > Yufei
> >
> >
> > On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> >
> > > Hi Anand,
> > >
> > > Sharing my view... subject to discussion:
> > >
> > > 1. Adding non-IRC REST API to Polaris is perfectly fine.
> > >
> > > Figuring out specific endpoint URIs and payloads might require a few
> > > roundtrips, so opening a separate thread for that might be best.
> > > Contributors commonly create Google Docs for new API proposals too
> (they
> > > fairly easy to update as the email discussion progresses).
> > >
> > > There was a suggestion to try Markdown (with PRs) for proposals [1] ...
> > > feel free to give it a try if you are comfortable with that.
> > >
> > > 2. Could you clarify whether you mean end user utilities or admin user
> > > utilities? In the latter case those might be more suitable for the
> Admin
> > > CLI (java) not the Python CLI, IMHO.
> > >
> > > Why would these utilities be common with events? IMHO, event use cases
> > are
> > > distinct from scan/commit metrics.
> > >
> > > 3. I'd prefer separating metrics persistence from MetaStore persistence
> > at
> > > the code level, so that they could be mixed and matched independently.
> > The
> > > separate datasource question will become a non-issue with that
> approach,
> > I
> > > guess.
> > >
> > > The rationale for separating scan metrics and metastore persistence is
> > that
> > > "cascading deletes" between them are hardly ever required. Furthermore,
> > the
> > > data and query patterns are very different so different technologies
> > might
> > > be beneficial in each case.
> > >
> > > [1]
> >
> https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev <
> > > [email protected]> wrote:
> > >
> > > > Thanks all. This PR is merged now.
> > > >
> > > > Here are the follow-up features / work needed.  These were all part
> of
> > > the
> > > > merged PR at some point in time and were removed to reduce scope.
> > > >
> > > > Please let me know what you think.
> > > >
> > > >
> > > >   1.  A REST API to paginate through table metrics. This will be
> > non-IRC
> > > > standard addition.
> > > >   2.  Utilities for managing old records, should be common with
> events.
> > > > There was some discussion that it belongs to the CLI.
> > > >   3.  Separate datasource (metrics, events, even other tables?).
> > > >
> > > >
> > > > Anything else?
> > > >
> > > > -
> > > > Anand
> > > >
> > > >
> > >
> >
> >
>

Reply via email to