Hi Wing,

Thank you for bringing this up. We run into this all the time, particularly
when the underlying storage has data management settings outside of
Iceberg's ownership (I.E. s3 retention policies). It is probably a weekly
occurrence, and one of the biggest pain points for new builders. Thanks for
kicking this off!

Zach

On Tue, Jan 28, 2025 at 5:36 AM Gabor Kaszab <gaborkas...@apache.org> wrote:

> Hi,
>
> I can also confirm that there are a number of users who find themselves
> unintentionally deleting some files and not being able to use their Iceberg
> tables anymore. The number of these incidents is surprisingly high for some
> reason. There was also a question on Iceberg Slack around this problem the
> other day. So I think it's reasonable to provide some recovery mechanisms
> in the Iceberg lib in some form to the users.
>
> I went through the PR for my own education and left some comments, mostly
> around the introduced table API for this. Please let me know if any of this
> makes sense.
>
> Cheers,
> Gabor
>
> On Mon, Jan 27, 2025 at 6:10 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
> wrote:
>
>> Hi,
>> A surprising number of our customers have inadvertently deleted files
>> that are part of their Iceberg tables (from storage), both data and
>> metadata. This has caused their Iceberg tables to be unreadable (or
>> unloadable in the case of missing metadata).
>> In the case of missing data files, we have provided code to the customer
>> to "repair" the table to make it readable again without the missing files
>> (where they are not able to recover the files at all). I have put up a PR,
>> https://github.com/apache/iceberg/pull/12106, for a Spark action to
>> remove missing data and delete files from table metadata. Perhaps this
>> would be useful to others.
>> I have kept the action simple. Removing a data file may result in
>> dangling deletes but the action does not do anything about that. However,
>> running rewrite_position_deletes_files or rewrite_data_files subsequently
>> would clean them up.
>> Repairing a table with missing metadata is more difficult and depends on
>> what metadata files are missing.
>> - Wing Yew
>>
>>

-- 
Zach Dischner
303-919-1364 | zach.disch...@gmail.com
Senior Software Development Engineer | Amazon Advertising
zachdischner.com <http://www.zachdischner.com/> | Flickr
<http://www.flickr.com/photos/zachd1_618/> | Smugmug
<http://zachdischner.smugmug.com/> | 2manventure
<http://2manventure.wordpress.com/>

Reply via email to