It makes sense to have an option to control the max number of snapshots.
Thanks Manu for the proposal.

Yufei


On Thu, Jan 16, 2025 at 7:46 PM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi all,
>
> Do you have more comments on this feature? Do you have concerns about
> adding a new field to SnapshotRef?
>
> Thanks,
> Manu
>
> On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> wrote:
>
>> Hi Ajantha,
>>
>> `history.expire.min-snapshots-to-keep` is the *minimum number of
>> snapshots* we can keep. I'm proposing to decide the *maximum number of
>> snapshots* to keep by count rather than by age.
>>
>> Thanks,
>> Manu
>>
>> On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> Hi Manu,
>>>
>>> We already have `retain_last` and `history.expire.min-snapshots-to-keep`
>>> to retain the snapshots based on count. Can you please elaborate on why
>>> can't we use the same?
>>>
>>> - Ajantha
>>>
>>> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Thanks Manu for starting this discussion. That is definitely a valid
>>>> feature. I have always found maintaining snapshots by day makes it harder
>>>> to provide different types of guarantees/contracts especially when tables
>>>> change rates are diverse or irregular. Maintaining by snapshot count makes
>>>> a lot of sense and prevents table sizes from growing excessively when
>>>> change rate is frequent.
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>>
>>>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> While maintaining Iceberg tables for our customers, I find it's
>>>>> difficult to set a default snapshot expiration time
>>>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The 
>>>>> default
>>>>> value of 5 days looks good for daily batch jobs but is too long for
>>>>> frequently-updated jobs.
>>>>>
>>>>> I'm thinking about adding another option like
>>>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A
>>>>> snapshot will be removed when either its age is larger than
>>>>> `history.expire.max-snapshot-age-ms` or it's the oldest in
>>>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft
>>>>> PR to demo the idea[1].
>>>>>
>>>>> If you agree this is a valid feature request, we also need to update
>>>>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a
>>>>> compatibility issue or too much cost to maintain compatibility? My
>>>>> experiment shows many parsers need to be updated.
>>>>>
>>>>> I'd like to hear your thoughts on this.
>>>>>
>>>>> 1. https://github.com/apache/iceberg/pull/11879
>>>>> 2. https://iceberg.apache.org/spec/#snapshot-references
>>>>>
>>>>> Happy New Year!
>>>>> Manu
>>>>>
>>>>

Reply via email to