Is "truncate" not an option? This would do a table wide delete which would create a new snapshot which you can keep. No data files would be valid after this?
On Wed, Jun 29, 2022 at 6:29 PM Steve Zhang <hongyue_zh...@apple.com.invalid> wrote: > Hey Iceberg Community: > > I am wondering if there’s any best practice to handle residual of data > files deleted from last snapshot in the iceberg table. > > Let me explain the use case here, considering the data retention policy in > place where some of the sensitive data can only be stored on disk for a > month. In iceberg way to keep the data off the disk, we need to generally > complete it in 3 steps > 1. delete data from the table, or drop partition (logical deletion) > 2. expire old snapshots (physical deletion to get data off the disk) > 3. remove orphaned files (not needed, but at scale this might be needed to > account for any failure in 2nd steps) > > However, from what I can tell, the iceberg expire-snapshot stored > procedure will not delete the last snapshot of the given table as stated in > https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L141-L146 > . > > So if the last snapshot happen to be the delete in step 1, and if there’s > no more transaction happen to the table, then the snapshot will not be > expired properly and leave the data files behind. I am not sure what’s the > right way to clean up the data files from the disk to comply with our > retention policy. Can anyone share some ideas? > > I guess drop table is one workaround but I am looking for less intrusive > way to leave the table as is, like its original state right after table > creation, before any data is written. > > Thanks, > Steve Zhang > > > >