Hi, We would like to store snapshot metadata that is necessary for producing/consuming incremental data. An example of this is the maximum value of an event timeline that we have processed so far, so that we know where to read from next.
Some of the possible options that we have discovered so far are: 1) to store such metadata in the TableMetadata properties, but this is already advised against in the Iceberg specification. 2) to use the max of the upper bounds of an event timestamp column tracked by the Datafiles in a snapshot, but this wouldn’t be accurate as we can have cases where the max value of an event timestamp column is less than the event time for which data spans (especially for sparse datasets). 3) to store such metadata in the summary property of the snapshot. This seems to be the most promising approach, but we wanted to know if there are any restrictions on the maximum length of information that can be stored in the summary property of a Snapshot. A downside to this approach is that the summary property of the snapshot only holds Strings, so we will have to always convert all data to Strings in order to use this. If none of the above is the most suitable place to store such information, please could anyone advise any other approaches they have taken to solve this? Thanks, Dabby