Hi,
A surprising number of our customers have inadvertently deleted files that
are part of their Iceberg tables (from storage), both data and metadata.
This has caused their Iceberg tables to be unreadable (or unloadable in the
case of missing metadata).
In the case of missing data files, we have provided code to the customer to
"repair" the table to make it readable again without the missing files
(where they are not able to recover the files at all). I have put up a PR,
https://github.com/apache/iceberg/pull/12106, for a Spark action to remove
missing data and delete files from table metadata. Perhaps this would be
useful to others.
I have kept the action simple. Removing a data file may result in dangling
deletes but the action does not do anything about that. However, running
rewrite_position_deletes_files or rewrite_data_files subsequently would
clean them up.
Repairing a table with missing metadata is more difficult and depends on
what metadata files are missing.
- Wing Yew

Reply via email to