Hi Suraj,

I just answered on slack, but I'll copy the replies here for everyone
that's subscribed to the dev list:

1) Yes, there are use cases around this. To assist, we're planning on
adding named snapshots so you don't keep complete history. Instead, you
should keep a selection of snapshots.
2) It is fine to keep snapshots for a long period of time. Part of the
purpose is to allow you to time travel and we've known about the use case
of keeping a labelled version around (e.g. what you trained a model with)
for a long time.
3) RewriteDataFiles will rewrite the files from one snapshot and produce
another. If you're keeping around old snapshots this wouldn't change them.
Although you probably could go rewrite those snapshots if you wanted to.

I hope that helps!

Ryan

On Sun, Jun 13, 2021 at 9:47 AM Suraj Chandran <chandransu...@gmail.com>
wrote:

> Hi there,
>
> (Had asked on Slack, trying here as well)
>
> The documentation proposes "regularly expiring snapshots is recommended to
> delete data files that are no longer needed, and to keep the size of table
> metadata small".
> I had a few questions around that:
> 1) Are there people/usecases who are keeping snapshots for a long history
> of time, like for decades? This would help people manage/find "back dated
> corrections" in data.
> 2) Are snapshots even meant for keeping history for such long periods of
> time.
> 3) Would regular rewriteDataFiles help in such cases (by how much?)
>
> Thanks,
> Suraj
>


-- 
Ryan Blue

Reply via email to