> 1. The current proposal only leaves 10000+200 ids for other columns > than stats. If in the future, we find some other feature which would > require a manifest file column for every data column in the table, then we > would need to change the spec. > > For this I think we could start at *100,000* so that we use *100,000 + 200 * <fieldID>* to calculate the field ID of a given statistic.
> > 1. The current proposal expects every engine to share the same stats, > and not store any "non-standard" stat in the metadata. > > We haven't explicitly stated it in the proposal but there were discussions on how to potentially support this and what implications it brings for readers/writers I'm still not clear on what the proposal is to handle stats for reserved > columns <https://iceberg.apache.org/spec/#reserved-field-ids> [1] (I > think there was some mention in the notes but it was light on details). It > seems like it would be potentially useful to have stats for things like > _row_id, and the multiplication would overflow for these column IDs (maybe > this still yields unique column IDs though?) > To handle stats for reserved columns we could start at *2,417,000,000* which should give us enough room to store 200 stats per metadata ID. We would also ensure that those ID ranges for table columns and reserved columns wouldn't overlap. I assume we could put whatever these columns are under stats? Maybe we just > need a more generic name for the top level struct? I haven't updated the proposal yet, but I think renaming *column_stats* to *content_stats* would make sense.