Re: Proposal to extend standardized statistics

Jacky Lee Thu, 28 Aug 2025 06:53:40 -0700

Excellent proposal!

We’ve internally augmented both table-level and partition-level
ColumnStatistics, and observed a 30%+ performance gain in Spark and
Trino query execution—largely due to improved Cost-Based Optimization
(CBO) effectiveness.
However, leveraging the v3 format presented numerous challenges (such
as column-type evolution and the way to save min/max values). We
believe adopting the v4 format would be a more robust solution.



I’ve researched this extensively and applied it in production. I’d be
glad to collaborate on implementing this feature if needed.


Best wishes.

Gábor Kaszab <[email protected]> 于2025年8月28日周四 21:23写道：
>
> Hey Iceberg Community,
>
> I've been working on a proposal to extend the currently standardized 
> statistics in Iceberg, by looking into what statistics are used by some query 
> engines and trying to fill the gaps (credit also goes to Denys K to lay 
> groundwork). The motivation is to use Iceberg for the source of truth when it 
> comes to statistics across all the engines.
> Meanwhile, there have been movements on other proposals (Restructuring 
> col-stats, Restructuring metadata) that might overlap with mine. Let’s see 
> how much of my proposal still holds up in light of these developments.
>
> Any feedback is appreciated!
> Gabor

Re: Proposal to extend standardized statistics

Reply via email to