Hi,

I can also confirm that there are a number of users who find themselves
unintentionally deleting some files and not being able to use their Iceberg
tables anymore. The number of these incidents is surprisingly high for some
reason. There was also a question on Iceberg Slack around this problem the
other day. So I think it's reasonable to provide some recovery mechanisms
in the Iceberg lib in some form to the users.

I went through the PR for my own education and left some comments, mostly
around the introduced table API for this. Please let me know if any of this
makes sense.

Cheers,
Gabor

On Mon, Jan 27, 2025 at 6:10 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
wrote:

> Hi,
> A surprising number of our customers have inadvertently deleted files that
> are part of their Iceberg tables (from storage), both data and metadata.
> This has caused their Iceberg tables to be unreadable (or unloadable in the
> case of missing metadata).
> In the case of missing data files, we have provided code to the customer
> to "repair" the table to make it readable again without the missing files
> (where they are not able to recover the files at all). I have put up a PR,
> https://github.com/apache/iceberg/pull/12106, for a Spark action to
> remove missing data and delete files from table metadata. Perhaps this
> would be useful to others.
> I have kept the action simple. Removing a data file may result in dangling
> deletes but the action does not do anything about that. However, running
> rewrite_position_deletes_files or rewrite_data_files subsequently would
> clean them up.
> Repairing a table with missing metadata is more difficult and depends on
> what metadata files are missing.
> - Wing Yew
>
>

Reply via email to