Hi folks,

I realized that my first email on this thread needs context to be
better understood :)

In Apache Polaris TMS (Table Maintenance Service), we "scoped" where
Polaris can help to trigger table maintenance jobs:
1. Is table maintenance enabled (in Polaris)?
2. Policies exposed by Polaris (e.g. data retention policy, compaction
policy, ...)
3. Polaris events (e.g. tables/views/namespaces updates)
4. Table metadata (via Iceberg REST)
    4.1. Table schema/partition spec/properties, etc
    4.2. Iceberg table Stats and metrics. Only the stats and metrics
are defined in the Iceberg table spec, e.g., partition stats, snapshot
summaries are available at this moment.

Specifically about 4.2, the Table Maintenance Service would need more than that.

My proposal about adding metrics endpoint to the REST spec is to
expose extra metrics for TMS and engine. I'm thinking of:
- metrics helping the compaction decisions and snapshots GC
- "extra" metrics which are very helpful for TMS (e.g. file size
distribution without partitions)

I would like to propose a "two steps" approach:
1. Add a "wild" metrics endpoint gathering all metrics for TMS/engines
but the exposed metrics are decided by the Catalog impl
2. Enforce metrics list in the spec with a clear schema and
standardized metrics names.

I will move forward with a proposal draft about that if there is no objection.

Thoughts ?

Regards
JB

On Tue, Jan 21, 2025 at 3:40 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> Hi folks,
>
> I know we don't want to "expose" the whole metadata tables in the REST
> api, but I would like to discuss adding metadata stats and metrics
> management.
> We are discussing this as part of the Apache Polaris TMS proposal.
>
> The purpose is:
> 1. To add interfaces to manage metadata stats and metrics (partition
> stats, snapshot summaries, relay Parquet stats exposed via REST, ...)
> 2. The catalog implementation can deal with table properties, but can
> also extend to "extra" stats and metrics if needed
> 3. Query planners can use these metadata stats and metrics to perform
> better query plans. It could also be used by the server side planning
> to provide "pre-plan check"
>
> Before going to a proposal document, I would like to get first
> feedback from the community (if it makes sense or not).
>
> Thoughts ?
>
> Thanks !
> Regards
> JB

Reply via email to