Hi all,

Do you have more comments on this feature? Do you have concerns about
adding a new field to SnapshotRef?

Thanks,
Manu

On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Ajantha,
>
> `history.expire.min-snapshots-to-keep` is the *minimum number of
> snapshots* we can keep. I'm proposing to decide the *maximum number of
> snapshots* to keep by count rather than by age.
>
> Thanks,
> Manu
>
> On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:
>
>> Hi Manu,
>>
>> We already have `retain_last` and `history.expire.min-snapshots-to-keep`
>> to retain the snapshots based on count. Can you please elaborate on why
>> can't we use the same?
>>
>> - Ajantha
>>
>> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Thanks Manu for starting this discussion. That is definitely a valid
>>> feature. I have always found maintaining snapshots by day makes it harder
>>> to provide different types of guarantees/contracts especially when tables
>>> change rates are diverse or irregular. Maintaining by snapshot count makes
>>> a lot of sense and prevents table sizes from growing excessively when
>>> change rate is frequent.
>>>
>>> Thanks,
>>> Walaa.
>>>
>>>
>>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> While maintaining Iceberg tables for our customers, I find it's
>>>> difficult to set a default snapshot expiration time
>>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The default
>>>> value of 5 days looks good for daily batch jobs but is too long for
>>>> frequently-updated jobs.
>>>>
>>>> I'm thinking about adding another option like
>>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A
>>>> snapshot will be removed when either its age is larger than
>>>> `history.expire.max-snapshot-age-ms` or it's the oldest in
>>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft
>>>> PR to demo the idea[1].
>>>>
>>>> If you agree this is a valid feature request, we also need to update
>>>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a
>>>> compatibility issue or too much cost to maintain compatibility? My
>>>> experiment shows many parsers need to be updated.
>>>>
>>>> I'd like to hear your thoughts on this.
>>>>
>>>> 1. https://github.com/apache/iceberg/pull/11879
>>>> 2. https://iceberg.apache.org/spec/#snapshot-references
>>>>
>>>> Happy New Year!
>>>> Manu
>>>>
>>>

Reply via email to