Re: Keeping infinite snapshots

2021-06-14 Thread Ryan Blue
Keeping snapshots will add some metadata, but it isn't a ton and you can probably drop some summary metadata to make it smaller (the Spark app ID, for example). Since compaction creates new snapshots, it wouldn't really help. What would help is keeping track of "versions" as branches. Then you can

Re: Keeping infinite snapshots

2021-06-14 Thread Suraj Chandran
Thanks. So our use-case is to keep all the snapshots till the beginning of time. How is that going to impact performance, since the metadata files will be quite a bit? Also would it reduce opportunities of data compaction? One idea I had around this was to create a solution in Iceberg to be able t

Re: Keeping infinite snapshots

2021-06-14 Thread Ryan Blue
Hi Suraj, I just answered on slack, but I'll copy the replies here for everyone that's subscribed to the dev list: 1) Yes, there are use cases around this. To assist, we're planning on adding named snapshots so you don't keep complete history. Instead, you should keep a selection of snapshots. 2)

Keeping infinite snapshots

2021-06-13 Thread Suraj Chandran
Hi there, (Had asked on Slack, trying here as well) The documentation proposes "regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small". I had a few questions around that: 1) Are there people/usecases who are keepin