Hi all, Do you have more comments on this feature? Do you have concerns about adding a new field to SnapshotRef?
Thanks, Manu On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> wrote: > Hi Ajantha, > > `history.expire.min-snapshots-to-keep` is the *minimum number of > snapshots* we can keep. I'm proposing to decide the *maximum number of > snapshots* to keep by count rather than by age. > > Thanks, > Manu > > On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > >> Hi Manu, >> >> We already have `retain_last` and `history.expire.min-snapshots-to-keep` >> to retain the snapshots based on count. Can you please elaborate on why >> can't we use the same? >> >> - Ajantha >> >> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> Thanks Manu for starting this discussion. That is definitely a valid >>> feature. I have always found maintaining snapshots by day makes it harder >>> to provide different types of guarantees/contracts especially when tables >>> change rates are diverse or irregular. Maintaining by snapshot count makes >>> a lot of sense and prevents table sizes from growing excessively when >>> change rate is frequent. >>> >>> Thanks, >>> Walaa. >>> >>> >>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> While maintaining Iceberg tables for our customers, I find it's >>>> difficult to set a default snapshot expiration time >>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The default >>>> value of 5 days looks good for daily batch jobs but is too long for >>>> frequently-updated jobs. >>>> >>>> I'm thinking about adding another option like >>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A >>>> snapshot will be removed when either its age is larger than >>>> `history.expire.max-snapshot-age-ms` or it's the oldest in >>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft >>>> PR to demo the idea[1]. >>>> >>>> If you agree this is a valid feature request, we also need to update >>>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a >>>> compatibility issue or too much cost to maintain compatibility? My >>>> experiment shows many parsers need to be updated. >>>> >>>> I'd like to hear your thoughts on this. >>>> >>>> 1. https://github.com/apache/iceberg/pull/11879 >>>> 2. https://iceberg.apache.org/spec/#snapshot-references >>>> >>>> Happy New Year! >>>> Manu >>>> >>>