It makes sense to have an option to control the max number of snapshots. Thanks Manu for the proposal.
Yufei On Thu, Jan 16, 2025 at 7:46 PM Manu Zhang <owenzhang1...@gmail.com> wrote: > Hi all, > > Do you have more comments on this feature? Do you have concerns about > adding a new field to SnapshotRef? > > Thanks, > Manu > > On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> wrote: > >> Hi Ajantha, >> >> `history.expire.min-snapshots-to-keep` is the *minimum number of >> snapshots* we can keep. I'm proposing to decide the *maximum number of >> snapshots* to keep by count rather than by age. >> >> Thanks, >> Manu >> >> On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Hi Manu, >>> >>> We already have `retain_last` and `history.expire.min-snapshots-to-keep` >>> to retain the snapshots based on count. Can you please elaborate on why >>> can't we use the same? >>> >>> - Ajantha >>> >>> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa < >>> wa.moust...@gmail.com> wrote: >>> >>>> Thanks Manu for starting this discussion. That is definitely a valid >>>> feature. I have always found maintaining snapshots by day makes it harder >>>> to provide different types of guarantees/contracts especially when tables >>>> change rates are diverse or irregular. Maintaining by snapshot count makes >>>> a lot of sense and prevents table sizes from growing excessively when >>>> change rate is frequent. >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> >>>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> While maintaining Iceberg tables for our customers, I find it's >>>>> difficult to set a default snapshot expiration time >>>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The >>>>> default >>>>> value of 5 days looks good for daily batch jobs but is too long for >>>>> frequently-updated jobs. >>>>> >>>>> I'm thinking about adding another option like >>>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A >>>>> snapshot will be removed when either its age is larger than >>>>> `history.expire.max-snapshot-age-ms` or it's the oldest in >>>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft >>>>> PR to demo the idea[1]. >>>>> >>>>> If you agree this is a valid feature request, we also need to update >>>>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a >>>>> compatibility issue or too much cost to maintain compatibility? My >>>>> experiment shows many parsers need to be updated. >>>>> >>>>> I'd like to hear your thoughts on this. >>>>> >>>>> 1. https://github.com/apache/iceberg/pull/11879 >>>>> 2. https://iceberg.apache.org/spec/#snapshot-references >>>>> >>>>> Happy New Year! >>>>> Manu >>>>> >>>>