Excellent proposal!

We’ve internally augmented both table-level and partition-level
ColumnStatistics, and observed a 30%+ performance gain in Spark and
Trino query execution—largely due to improved Cost-Based Optimization
(CBO) effectiveness.
However, leveraging the v3 format presented numerous challenges (such
as column-type evolution and the way to save min/max values). We
believe adopting the v4 format would be a more robust solution.


I’ve researched this extensively and applied it in production. I’d be
glad to collaborate on implementing this feature if needed.


Best wishes.

Gábor Kaszab <gaborkas...@apache.org> 于2025年8月28日周四 21:23写道:
>
> Hey Iceberg Community,
>
> I've been working on a proposal to extend the currently standardized 
> statistics in Iceberg, by looking into what statistics are used by some query 
> engines and trying to fill the gaps (credit also goes to Denys K to lay 
> groundwork). The motivation is to use Iceberg for the source of truth when it 
> comes to statistics across all the engines.
> Meanwhile, there have been movements on other proposals (Restructuring 
> col-stats, Restructuring metadata) that might overlap with mine. Let’s see 
> how much of my proposal still holds up in light of these developments.
>
> Any feedback is appreciated!
> Gabor

Reply via email to