Hi Ryan, I think you could achieve what you're looking for by setting the age to 1 > ms and the minimum number of snapshots to keep
I'm not sure how I can set the minimum number of snapshots to keep for tables with different update frequencies. For a daily updated table, I might set it to 5. However, that's too few for an hourly updated table. On the other hand, I want to keep as many snapshots as the filesystem allows. Daniel, > the most important part is to have an updated version of the retention > procedure <https://iceberg.apache.org/spec/#snapshot-retention-policy> to > clearly state how this interacts with the other settings as part of the PR I agree. I'd argue even the current definition is not clear. People are not aware that the number of snapshots to keep is actually a range, [min-snapshots-to-keep, max-snapshots *not-expired-by-age*]. The lower bound takes precedence over the upper bound. I'm proposing to add another parameter to the upper bound, which will become, max-snapshots *not-expired-by-age AND not-expired-by-count*. No changes to precedence. With a reasonable value of max-snapshots-to-keep, e.g. 24, snapshots of a daily updated table will expire by age after 5 days, and those of an hourly updated table expire by count after 1 day. For both tables, we can keep as many snapshots as possible for time travel or rollback while not exceeding the quota of the filesystem. Hope it's clearer now. Thanks, Manu On Wed, Jan 22, 2025 at 4:42 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > I do think this comes up a lot and is one of the more confusing things > about the snapshot expiration. Definitely one of my most answered questions > is: "When I set min-snapshots to 1, why do I not get only 1 snapshot." I > agree adding another behavior may be even more confusing but I wouldn't be > opposed to having it be a parameter of the existing expire snapshots > action. Something like, expireAllBut(x). Setting the expiration time to 1ms > and setting a number of min-snapshots has always felt a bit hacky to me but > I've recommended it many times. > > I am open to any change to this, because if any question comes up this > many times, it is probably confusing. > > On Tue, Jan 21, 2025 at 2:27 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > >> I think you could achieve what you're looking for by setting the age to 1 >> ms and the minimum number of snapshots to keep. Then snapshot expiration >> would always expire all snapshots other than the min number, getting you >> what you want. >> >> It probably wouldn't make sense to set a maximum as well. Right now, the >> min number of snapshots is a requirement that keeps snapshots around even >> if they are eligible to be removed because of expiration. A maximum would >> work differently and would be a second way to consider a snapshot eligible >> for expiration -- or else we would have to redefine how the min works. I >> think that would be a bit confusing to configure in practice because we'd >> need to define these cases for which configuration takes precedence. It >> seems much simpler to me to use the min snapshots setting with a very short >> expiration interval if you want to always keep some number of snapshots >> rather than using the age-based expiration. >> >> On Tue, Jan 21, 2025 at 9:51 AM Daniel Weeks <dwe...@apache.org> wrote: >> >>> Hey Manu, >>> >>> I think I understand what you're trying to achieve here and I feel like >>> the most important part is to have an updated version of the retention >>> procedure <https://iceberg.apache.org/spec/#snapshot-retention-policy> to >>> clearly state how this interacts with the other settings as part of the PR. >>> >>> -Dan >>> >>> On Thu, Jan 16, 2025 at 8:37 PM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> It makes sense to have an option to control the max number of >>>> snapshots. Thanks Manu for the proposal. >>>> >>>> Yufei >>>> >>>> >>>> On Thu, Jan 16, 2025 at 7:46 PM Manu Zhang <owenzhang1...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Do you have more comments on this feature? Do you have concerns about >>>>> adding a new field to SnapshotRef? >>>>> >>>>> Thanks, >>>>> Manu >>>>> >>>>> On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Ajantha, >>>>>> >>>>>> `history.expire.min-snapshots-to-keep` is the *minimum number of >>>>>> snapshots* we can keep. I'm proposing to decide the *maximum number >>>>>> of snapshots* to keep by count rather than by age. >>>>>> >>>>>> Thanks, >>>>>> Manu >>>>>> >>>>>> On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Manu, >>>>>>> >>>>>>> We already have `retain_last` and >>>>>>> `history.expire.min-snapshots-to-keep` to retain the snapshots based on >>>>>>> count. Can you please elaborate on why can't we use the same? >>>>>>> >>>>>>> - Ajantha >>>>>>> >>>>>>> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks Manu for starting this discussion. That is definitely a >>>>>>>> valid feature. I have always found maintaining snapshots by day makes >>>>>>>> it >>>>>>>> harder to provide different types of guarantees/contracts especially >>>>>>>> when >>>>>>>> tables change rates are diverse or irregular. Maintaining by snapshot >>>>>>>> count >>>>>>>> makes a lot of sense and prevents table sizes from growing excessively >>>>>>>> when >>>>>>>> change rate is frequent. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> While maintaining Iceberg tables for our customers, I find it's >>>>>>>>> difficult to set a default snapshot expiration time >>>>>>>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The >>>>>>>>> default >>>>>>>>> value of 5 days looks good for daily batch jobs but is too long for >>>>>>>>> frequently-updated jobs. >>>>>>>>> >>>>>>>>> I'm thinking about adding another option like >>>>>>>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A >>>>>>>>> snapshot will be removed when either its age is larger than >>>>>>>>> `history.expire.max-snapshot-age-ms` or it's the oldest in >>>>>>>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a >>>>>>>>> draft >>>>>>>>> PR to demo the idea[1]. >>>>>>>>> >>>>>>>>> If you agree this is a valid feature request, we also need to >>>>>>>>> update SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will >>>>>>>>> there be a compatibility issue or too much cost to maintain >>>>>>>>> compatibility? >>>>>>>>> My experiment shows many parsers need to be updated. >>>>>>>>> >>>>>>>>> I'd like to hear your thoughts on this. >>>>>>>>> >>>>>>>>> 1. https://github.com/apache/iceberg/pull/11879 >>>>>>>>> 2. https://iceberg.apache.org/spec/#snapshot-references >>>>>>>>> >>>>>>>>> Happy New Year! >>>>>>>>> Manu >>>>>>>>> >>>>>>>>