Hey Iceberg Community: I am wondering if there’s any best practice to handle residual of data files deleted from last snapshot in the iceberg table.
Let me explain the use case here, considering the data retention policy in place where some of the sensitive data can only be stored on disk for a month. In iceberg way to keep the data off the disk, we need to generally complete it in 3 steps 1. delete data from the table, or drop partition (logical deletion) 2. expire old snapshots (physical deletion to get data off the disk) 3. remove orphaned files (not needed, but at scale this might be needed to account for any failure in 2nd steps) However, from what I can tell, the iceberg expire-snapshot stored procedure will not delete the last snapshot of the given table as stated in https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L141-L146 <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L141-L146>. So if the last snapshot happen to be the delete in step 1, and if there’s no more transaction happen to the table, then the snapshot will not be expired properly and leave the data files behind. I am not sure what’s the right way to clean up the data files from the disk to comply with our retention policy. Can anyone share some ideas? I guess drop table is one workaround but I am looking for less intrusive way to leave the table as is, like its original state right after table creation, before any data is written. Thanks, Steve Zhang