Hi all,

While maintaining Iceberg tables for our customers, I find it's difficult
to set a default snapshot expiration time
(`history.expire.max-snapshot-age-ms`) for different workloads. The default
value of 5 days looks good for daily batch jobs but is too long for
frequently-updated jobs.

I'm thinking about adding another option like
`history.expire.max-snapshots-to-keep` to keep at most N snapshots. A
snapshot will be removed when either its age is larger than
`history.expire.max-snapshot-age-ms` or it's the oldest in
`history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft
PR to demo the idea[1].

If you agree this is a valid feature request, we also need to update
SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a
compatibility issue or too much cost to maintain compatibility? My
experiment shows many parsers need to be updated.

I'd like to hear your thoughts on this.

1. https://github.com/apache/iceberg/pull/11879
2. https://iceberg.apache.org/spec/#snapshot-references

Happy New Year!
Manu

Reply via email to