Ryan, thanks for the reply.
I have created issue (https://github.com/apache/incubator-iceberg/issues/181 
<https://github.com/apache/incubator-iceberg/issues/181>) and will try to come 
up with the PR.

Kind regards,
Arina

> On May 6, 2019, at 9:14 PM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> Arina,
> 
> So far, we’ve kept these around to help troubleshoot format problems. It has 
> been a fairly cheap way to be able to see exactly what happened to the table. 
> But we’re also getting to the point where we no longer need to refer back to 
> them and should think about adding a way to remove them. Technically, you 
> don’t need to keep them around once you’ve committed the new version, but an 
> easy way to roll back is to change the database pointer so it is nice to keep 
> a few of them.
> 
> I think we can probably build a way to expire old metadata versions by 
> looking for a naming pattern, like v(num)-(uuid).metadata.json[.gz]. Would 
> you like to add an issue and maybe a PR for this?
> 
> rb
> 
> 
> On Sat, May 4, 2019 at 7:43 AM Arina Yelchiyeva <arina.yelchiy...@gmail.com 
> <mailto:arina.yelchiy...@gmail.com>> wrote:
> Hi all,
> 
> Iceberg table has expire snapshots notion, which helps to delete snapshots 
> that are no longer needed along with data files, manifest and manifest lists:
> 
>         // clean up the expired snapshots: 
>         // 1. Get a list of the snapshots that were removed
>         // 2. Delete any data files that were deleted by those snapshots and 
> are not in the table 
>         // 3. Delete any manifests that are no longer used by current 
> snapshots
>         // 4. Delete the manifest lists
> 
> But we also have table metadata which is stored in JSON. New metadata version 
> is created for each metadata change.
> I was assuming that with snapshot expiration operation, unneeded metadata 
> files will also be deleted but they are not.
> 
> My concern is that having JSON file for each metadata change with time may 
> consume lots of space (setting `iceberg.compress.metadata` to true can help 
> but not for long).
> Is there an option to expire table metadata versions as well?
> 
> Kind regards,
> Arina
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Reply via email to