Thanks.
So our use-case is to keep all the snapshots till the beginning of time.
How is that going to impact performance, since the metadata files will be
quite a bit?
 Also would it reduce opportunities of data compaction?
One idea I had around this was to create a solution in Iceberg to be able
to isolate multiple snapshots completely, "so they don't share metadata
among them", it would increase data but it’s almost like the new snapshot
can be completely independent and hence can be compacted independently of
older snapshots metadata, increasing performance. Does that make any sense
at all?

On Mon, Jun 14, 2021 at 9:27 PM Ryan Blue <b...@apache.org> wrote:

> Hi Suraj,
>
> I just answered on slack, but I'll copy the replies here for everyone
> that's subscribed to the dev list:
>
> 1) Yes, there are use cases around this. To assist, we're planning on
> adding named snapshots so you don't keep complete history. Instead, you
> should keep a selection of snapshots.
> 2) It is fine to keep snapshots for a long period of time. Part of the
> purpose is to allow you to time travel and we've known about the use case
> of keeping a labelled version around (e.g. what you trained a model with)
> for a long time.
> 3) RewriteDataFiles will rewrite the files from one snapshot and produce
> another. If you're keeping around old snapshots this wouldn't change them.
> Although you probably could go rewrite those snapshots if you wanted to.
>
> I hope that helps!
>
> Ryan
>
> On Sun, Jun 13, 2021 at 9:47 AM Suraj Chandran <chandransu...@gmail.com>
> wrote:
>
>> Hi there,
>>
>> (Had asked on Slack, trying here as well)
>>
>> The documentation proposes "regularly expiring snapshots is recommended
>> to delete data files that are no longer needed, and to keep the size of
>> table metadata small".
>> I had a few questions around that:
>> 1) Are there people/usecases who are keeping snapshots for a long history
>> of time, like for decades? This would help people manage/find "back dated
>> corrections" in data.
>> 2) Are snapshots even meant for keeping history for such long periods of
>> time.
>> 3) Would regular rewriteDataFiles help in such cases (by how much?)
>>
>> Thanks,
>> Suraj
>>
>
>
> --
> Ryan Blue
>

Reply via email to