Re: [DISCUSS] Update supported blob types in puffin spec

Denys Kuzmenko Tue, 04 Feb 2025 12:15:04 -0800

Thanks All for the reactions.

I wanted to emphasize that Hive's StatsObject was shared as an example with the 
suggestion to adapt it for iceberg - `PartitionColumnStats` (i.e. use column 
ids and drop name/type, etc).


As was mentioned by Rayan, column upper/lower bounds, counts, null value and 
NaN counts are tracked at file level. 
For the partition aggregates, we still need some compound object that could 
provide all values at once, precomputed, similar to basic stats - 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/PartitionStats.java

WDYT?

Re: [DISCUSS] Update supported blob types in puffin spec

Reply via email to