Hi folks, I would like to discuss an idea for an optional extension of Iceberg's Snapshot metadata lifecycle. Thanks Piotr for replying on the other thread that this should be a fuller Iceberg format change.
*Proposal Summary* Currently, ExpireSnapshots(long olderThan) purges metadata and deleted data of a Snapshot together. Purging deleted data often requires a smaller timeline, due to strict requirements to claw back unused disk space, fulfill data lifecycle compliance, etc. In many deployments, this means 'olderThan' timestamp is set to just a few days before the current time (the default is 5 days). On the other hand, purging metadata could be ideally done on a more relaxed timeline, such as months or more, to allow for meaningful historical table analysis. We should have an optional way to purge Snapshot metadata separately from purging deleted data. This would allow us to get history of the table, and answer questions like: - When was a file/partition added - When was a file/partition deleted - How much data was added or removed in time X that are currently only possible for data operations within a few days. *Github Proposal*: https://github.com/apache/iceberg/issues/10646 *Google Design Doc*: https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit <https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit> Curious if anyone has thought along these lines and/or sees obvious issues. Would appreciate any feedback on the proposal. Thanks Szehon