Is "truncate" not an option? This would do a table wide delete which would
create a new snapshot which you can keep. No data files would be valid
after this?

On Wed, Jun 29, 2022 at 6:29 PM Steve Zhang <hongyue_zh...@apple.com.invalid>
wrote:

> Hey Iceberg Community:
>
> I am wondering if there’s any best practice to handle residual of data
> files deleted from last snapshot in the iceberg table.
>
> Let me explain the use case here, considering the data retention policy in
> place where some of the sensitive data can only be stored on disk for a
> month. In iceberg way to keep the data off the disk, we need to generally
> complete it in 3 steps
> 1. delete data from the table, or drop partition (logical deletion)
> 2. expire old snapshots (physical deletion to get data off the disk)
> 3. remove orphaned files (not needed, but at scale this might be needed to
> account for any failure in 2nd steps)
>
> However, from what I can tell, the iceberg expire-snapshot stored
> procedure will not delete the last snapshot of the given table as stated in
> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L141-L146
> .
>
> So if the last snapshot happen to be the delete in step 1, and if there’s
> no more transaction happen to the table, then the snapshot will not be
> expired properly and leave the data files behind. I am not sure what’s the
> right way to clean up the data files from the disk to comply with our
> retention policy. Can anyone share some ideas?
>
> I guess drop table is one workaround but I am looking for less intrusive
> way to leave the table as is, like its original state right after table
> creation, before any data is written.
>
> Thanks,
> Steve Zhang
>
>
>
>

Reply via email to