Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Yufei Gu Fri, 17 Oct 2025 18:50:25 -0700

Thanks, Pierre and Oleg! This revision is heading in the right direction. A
few points to consider:


1. Metrics ingestion(write) REST API. Can we reuse the Iceberg IRC path
/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics and extend it
with a new payload type? I suggest we float this with the Iceberg community
first, the interoperability benefits would be huge.
2. Metrics serving (read) REST API. This makes sense to me, but I’d also
propose socializing the read side in the Iceberg community to avoid future
conflicts. Given that a write endpoint is already defined, adding a read
endpoint feels like the natural complement. One of recent examples is
adding event endpoints[1][2].
3. For table snapshot metrics derived from snapshot summaries, we should
include snapshot_id and timestamp in the schema so the metrics are
interpretable and time-bound.
4. The draft leans to OLTP (e.g., Postgres). Could we add more rationale on
why OLTP was chosen over a TSDB (query patterns, retention, cardinality,
cost/ops, backfill, downsampling)? Even a brief trade-off table would help.

I wanna expand a bit on the Rest spec design. It's a huge win if we can
avoid framegenentation between IRC and Polaris REST spec. I'd suggest
looking into event API spec design[2], which provides a good example for
predefined types and custom types[1]. The metric endpoint could take the
same way to add a few common metric types to improve interoperability, and
add custom metric types to allow Polaris specific or user specific metrics.

   - enum:
        - create-table
        - register-table
....
        - drop-namespace
      - $ref: '#/components/schemas/CustomOperationType'

1. https://github.com/apache/iceberg/pull/12584/files
2.
https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?tab=t.0


Yufei


On Wed, Oct 15, 2025 at 2:48 AM Pierre Laporte <[email protected]>
wrote:

> Hi folks
>
> Oleg and I have been working on the proposal to take your feedback into
> account.  The updated doc is available at
>
> https://docs.google.com/document/d/1oFsuI_WKY0QqVqBNS4gtLlDdlZiV9fGmmUG3rLSED4Y/edit?tab=t.0#heading=h.1duembdpfkwi
>
> Notable changes:
>
>    - For the first phase of the proposal, Polaris will collect certain
>    Iceberg metrics provided by query engines via the IRC, leveraging
> Snapshot
>    Summaries (e.g. total number of files, total number of data/delete files
>    added, ...).
>    - Two SPI implementations are provided, including one to store the
>    latest metric values in Polaris' Postgres database, including a database
>    schema example.
>    - A new SPI method has been added so that Polaris can request metrics
>    deletion when a table is dropped
>
> The rest of the proposal has not changed much.  It still allows us to add
> more metric definitions in the registry, as we see fit.  It still includes
> the same API endpoints and RBAC integration.  And it is compatible with
> supporting metrics for non-Iceberg tables.
>
> Hopefully, this gives a better view of how operational metrics can be
> either collected directly by Polaris or pushed by external services (that
> will be in a subsequent phase to keep this proposal short).
>
> Cheers
> --
>
> Pierre
>
>
> On Sun, Oct 5, 2025 at 5:19 PM Pierre Laporte <[email protected]>
> wrote:
>
> >
> > On Fri, Oct 3, 2025 at 7:28 PM Eric Maynard <[email protected]>
> > wrote:
> >
> >> > IMHO, we should not add a dependency between this proposal and other
> >> efforts that are not implemented yet, as it would prevent us from moving
> >> forward on operational metrics until all the pieces are in place.
> >>
> >> This is an interesting argument given that the delegation service
> proposal
> >> you mention was/is blocked because of another effort that was not (is
> >> not?)
> >> implemented. I still don’t understand how this is materially different
> and
> >> thought the delegation service was intended to support these operational
> >> metrics.
> >>
> >
> > Let's not conflate the current proposal with the other discussions about
> > how Polaris could execute synchronous/asynchronous tasks.  We should
> > continue the discussions about the Delegation Service and the Async &
> > Reliable Tasks proposals in their respective threads.  And those
> > discussions should (IMHO) not prevent us from moving forward on the
> > operational metrics bits we agree on.
> >
> > I do think we need to figure out on a high level which direction we’re
> >> going here rather than just rush forward with the first proposal that
> >> doesn’t immediately get a -1.
> >>
> >
> > I am not sure I understand this statement.  What I am proposing is that
> we
> > start implementing the parts that we have consensus on, and we continue
> > discussing the other parts.  I would not call that "rush forward with the
> > first proposal that doesn’t immediately get a -1".
> >
> > Let me repeat my question as I think it is important we decide on an
> > answer to avoid confusion: As we discussed in this thread and during
> > previous community calls, the goal of the second proposal is to start
> > small, and build our way up.  It is not about having a perfect design
> > document before starting implementation.  Has this changed?
> >
> > I am personally in favor of that incremental approach.  That being said,
> > if instead of going that way, the community would rather have a fully
> > designed system before any of the implementation work happens, that's
> fine
> > by me.  We just need to clarify it so that there is no ambiguity.
> >
>

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Reply via email to