Hey everyone, I have updated the proposal <https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2> with the following things:
- removed *column_size*, since this hasn't been used anywhere in earlier versions. Please shout if you think we should keep this going forward. - added *avg_value_size* and *max_value_size* for avg/max value sizes of variable-length types (string/binary) - the examples in the proposal were using *1_417_000_000* as the starting stats ID for the reserved field ID space, but that should have been *2_147_000_000* because we have 200 reserved IDs * 200 stats types = 40k and using *2_147_000_000* leaves enough room in case we decide to add other ID spaces If people are ok then I think we should be able to vote on the design proposal so that we could get the first portions of the code <https://github.com/apache/iceberg/pull/13933> in, which would allow parallelizing downstream work on this Thanks Eduard On Wed, Aug 20, 2025 at 3:05 PM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > Hey everyone, > > We met yesterday and talked about some details around the stats proposal. > > Please find the notes here > <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?usp=sharing> > and the recording here > <https://drive.google.com/file/d/1YIILCIhDbgu3OYlMn5KNChsYFP8rGPPX/view?usp=sharing> > . > > I have updated the proposal <https://s.apache.org/iceberg-column-stats> > with the following points: > > - added a table schema example with a detailed stats schema > - updated wording to make it clear that projection is always by ID and > the field name of a stats field should not be relied on > - added a table that defines current field stats types with their > respective offsets from the field ID of the base stats struct > - updated wording to make it clear that stats are calculated for > assigned field IDs that are > - defined in the table ID space (Amogh is working on a separate > proposal to unify ID spaces) > - defined in the reserved field ID > <https://iceberg.apache.org/spec/#reserved-field-ids> space > - added some examples showing table ID -> stats ID of stats struct and > also the stats ID of individual stats fields > - updated wording to explain how variant stats would look in the new > stats structure > - updated wording to make it clear that custom stats are not supported > and that expressions are the preferred way > > Please let me know in case I missed anything else to include. > > Thanks everyone for participating, > > Eduard > > >