Re: [DISCUSSION] Add statistics to CLI

Yong Zheng Mon, 23 Feb 2026 21:02:09 -0800

Hello Yufei,

I think that is fair as some ops can be really compute intensive. I will try to 
avoid those for now. But with just plain iceberg metadata, there are already a 
lot interesting stuff can be done which is why I would like to expand the 
capacity of current CLI instead of using another compute tool for gathering 
those.


Thanks,
Yong Zheng

On 2026/02/24 00:38:25 Yufei Gu wrote:
> +1 on adding new features if they are REST based. However, I do not think
> it is a good idea to extend the CLI to anything that requires FileIO or has
> significant performance implications.
> 
> For example, the number of snapshots and anything available in the snapshot
> summary should be fine. But calculating the number of partitions would
> require the CLI to read manifest files through FileIO, and I do not think
> the CLI should take on that responsibility.
> 
> > Table current effective policies
> 
> I believe this is already supported.
> 
> > Table diagnostics, such as too many small files
> 
> This would also require FileIO, which I do not think belongs in the CLI.
> 
> 
> Yufei
> 
> 
> On Mon, Feb 23, 2026 at 2:58 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
> 
> > Hi Yong,
> >
> > Adding these features to the CLI sounds useful to me.
> >
> > We can obtain relevant data via REST APIs, right?
> >
> > Cheers,
> > Dmitri
> >
> > On Sat, Feb 21, 2026 at 1:57 AM Yong Zheng <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > The current CLI is primarily used to interact with various entities in
> > > Iceberg. Oftentimes, the CLI itself is not very handy after the initial
> > > bootstrap or after entity modification. For example, if I need to look up
> > > how many tables there are under a given namespace or the number of
> > > snapshots a given table has, I would need to switch to a different tool
> > > (e.g., pyiceberg, Trino, Spark) to write some queries or run some
> > > predefined scripts to obtain this information. Thus, I am wondering if we
> > > should enrich the current CLI with a bit more functionality around
> > > statistics to fulfill the following more use cases such as:
> > > 1. Number of tables under a given namespace
> > > 2. Table metadata statistics report (number of snapshots, snapshot size,
> > > number of partitions, etc.)
> > > 3. Table storage location and size
> > > 4. Table current effective policies
> > > 5. Table diagnostics (e.g., too many small files)
> > >
> > > Thanks,
> > > Yong Zheng
> > >
> >
>

Re: [DISCUSSION] Add statistics to CLI

Reply via email to