Keeping snapshots will add some metadata, but it isn't a ton and you can
probably drop some summary metadata to make it smaller (the Spark app ID,
for example).
Since compaction creates new snapshots, it wouldn't really help. What would
help is keeping track of "versions" as branches. Then you can
Thanks.
So our use-case is to keep all the snapshots till the beginning of time.
How is that going to impact performance, since the metadata files will be
quite a bit?
Also would it reduce opportunities of data compaction?
One idea I had around this was to create a solution in Iceberg to be able
t
Hi Suraj,
I just answered on slack, but I'll copy the replies here for everyone
that's subscribed to the dev list:
1) Yes, there are use cases around this. To assist, we're planning on
adding named snapshots so you don't keep complete history. Instead, you
should keep a selection of snapshots.
2)
Hi there,
(Had asked on Slack, trying here as well)
The documentation proposes "regularly expiring snapshots is recommended to
delete data files that are no longer needed, and to keep the size of table
metadata small".
I had a few questions around that:
1) Are there people/usecases who are keepin