Hi Ashish It's indeed a bug for my understanding, I read your idea about the transaction hook. removing the data & manifest file of expired snapshots should happen after writing version-hint file (otherwise there will be some readers accessing snapshots which we are deleting data files). so the hook should happen after writeVersionHint I guess, while the javadoc of TableOperations#commit says: Once the atomic commit operation succeeds, implementations must not perform any operations that may fail because failure in this method cannot be distinguished from commit failure.
So it's better not to do that hook after commit. In my opinion, we may need a tool to validate the snapshot, says check whether the files of snapshot is complete, if not complete then we can also use the tool to clean the orphan files. Thanks. On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com> wrote: > Hi, > > While using feature of Expire/RemoveSnapshots, I saw that the clean up > operation of files happens, after successful commit [1] of snapshot list > and update to `version-hint.text`, which means that in case of > intermittent/IOException from underlying store, we might end up leaving the > files on disk, without any reference in table's latest version/snapshot > list. Is there an API to clean that up, after the snapshots are gone from > history? > > I raised a issue for this > https://github.com/apache/incubator-iceberg/issues/822 > Let me know, if I am missing something. > > [1]: > https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144 > > Thanks, > Ashish > >