BLUF: - We are using snapshot references to preserve custom table-level metadata that currently exists in snapshot summaries. Is this an anti-pattern or expected usage? - If it is an anti-pattern, is there something else in the spec we can use for this purpose? If not, would it make sense to introduce table-level metadata in the spec?
Details below: Hello Iceberg community, We (Redpanda[1][2]) have built a log storage engine that, in addition to writing log format data, writes data as Parquet files and commits them to the Iceberg catalog. One of the requirements we have is to ensure exactly once delivery of records into Iceberg. To this end, we keep metadata in two places: - In the Iceberg table, we add the position in our log up to which has been committed as a field in each new Iceberg snapshot’s summary. - In our system, we checkpoint this same position up to which we have committed to Iceberg. It’s possible for these to diverge (e.g. in the event of a node failure in between the above two events), but in such cases, the Iceberg table is taken as the source of truth. As I understand it, this is the same technique the Kafka Connect connector uses. But there is a problem with this approach when considering snapshot expiry alongside concurrent updates from multiple systems. While the default snapshot expiration is 5 days, it’s conceivable a user sets the table’s snapshot expiry to something significantly lower to avoid metadata bloat. To boot, we cannot assume that our system is the only system writing to Iceberg, and the main snapshot is the only snapshot guaranteed to be retained at all times. It’s thus conceivable that external systems add snapshots to the table, and for snapshot expiry to remove the snapshot metadata we require. If these conditions are met in a moment of divergence, there is room for exactly once delivery to be violated and for files to be committed to the table more than once. To mitigate this, we maintain an Iceberg tag for the latest snapshot written by our system, and rely on the snapshot reference expiry policy[3] to ensure that these tagged snapshots aren’t removed, with the assumption that it is more likely to tune down the `max-snapshot-age-ms` property (to keep manifest list size small) than it is to tune down the `max-ref-age-ms` property. There are still at least a couple issues with this approach: - A user can still set `max-ref-age-ms` to something pathologically small and end up causing an exactly-once violation. - It feels like we’re overloading the intended behavior of tags by using them to force explicit snapshot retention. Our question is, is there anything better that we can be doing here? Are there other parts of the spec that can serve our needs? Table properties field seems somewhat what we want, but: - It is explicitly described as being not meant for arbitrary metadata[4]. - For it to be useful for our use case, we'd need some kind of table requirement that checks these properties atomically (today, we use snapshot-based table requirements when we commit). So if not something existing, do folks have thoughts on generalized ways to store custom metadata in the table? As an example, is there any appetite in adding a different table-level metadata field to the spec? As Iceberg becomes adopted by more systems, it's not hard to imagine similar requirements popping up elsewhere. It may be useful in similar situations to update the table only if certain metadata fields have not changed, without tying these fields to specific snapshots. Thanks, Andrew [1] https://www.redpanda.com/ [2] https://github.com/redpanda-data/redpanda [3] https://iceberg.apache.org/spec/#snapshot-retention-policy [4] https://iceberg.apache.org/spec/#table-metadata-fields