Hi Manu,

We already have `retain_last` and `history.expire.min-snapshots-to-keep` to
retain the snapshots based on count. Can you please elaborate on why can't
we use the same?

- Ajantha

On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Thanks Manu for starting this discussion. That is definitely a valid
> feature. I have always found maintaining snapshots by day makes it harder
> to provide different types of guarantees/contracts especially when tables
> change rates are diverse or irregular. Maintaining by snapshot count makes
> a lot of sense and prevents table sizes from growing excessively when
> change rate is frequent.
>
> Thanks,
> Walaa.
>
>
> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com> wrote:
>
>> Hi all,
>>
>> While maintaining Iceberg tables for our customers, I find it's difficult
>> to set a default snapshot expiration time
>> (`history.expire.max-snapshot-age-ms`) for different workloads. The default
>> value of 5 days looks good for daily batch jobs but is too long for
>> frequently-updated jobs.
>>
>> I'm thinking about adding another option like
>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A
>> snapshot will be removed when either its age is larger than
>> `history.expire.max-snapshot-age-ms` or it's the oldest in
>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a draft
>> PR to demo the idea[1].
>>
>> If you agree this is a valid feature request, we also need to update
>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be a
>> compatibility issue or too much cost to maintain compatibility? My
>> experiment shows many parsers need to be updated.
>>
>> I'd like to hear your thoughts on this.
>>
>> 1. https://github.com/apache/iceberg/pull/11879
>> 2. https://iceberg.apache.org/spec/#snapshot-references
>>
>> Happy New Year!
>> Manu
>>
>

Reply via email to