Thanks for bringing this up, Ashish. The section of code that you point out is *mostly* correct, but there is a case that we should fix. The reason I say “mostly correct” is that it has correct behavior and has error handling to ensure that as much of the work is done as possible. Even if not all of the data is cleaned up, the behavior must be to throw an exception if and only if the metadata commit was successful. If the commit is successful, then throwing an error puts a client in an unrecoverable state because it thinks that the table hasn’t changed when it actually did. If a client retries an operation that was already committed successfully, it could make incorrect changes to the table.
To address reliability, cleanup uses Tasks so that a single failure doesn’t prevent attempting to delete other files <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L281> and files that can’t be deleted are logged. There is similar failure handling while finding files to delete, so that read failures don’t prevent deleting <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L291> as much data as possible. If you want even more guarantees, then you can add a custom delete method and persist the files to delete for another service to handle. Now that I’m looking at this, one area that can be improved is the handling of manifest list files. There is no failure handling <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L333-L344> while reading manifest lists, so we should add that. I should also point out that this is not the only way for data to leak. For example, an executor failure in Spark can lead to a data file getting written by not committed to a table. I recommend having a way to periodically list table locations and check against the metadata tables, like all_data_files and all_manifests, to ensure that no files are left behind. On Wed, Feb 26, 2020 at 5:53 AM OpenInx open...@gmail.com <http://mailto:open...@gmail.com> wrote: Hi Ashish > > It's indeed a bug for my understanding, I read your idea about the > transaction hook. removing the data & manifest file > of expired snapshots should happen after writing version-hint file > (otherwise there will be some readers accessing > snapshots which we are deleting data files). so the hook should happen > after writeVersionHint I guess, while the javadoc > of TableOperations#commit says: Once the atomic commit operation succeeds, > implementations must not perform > any operations that may fail because failure in this method cannot be > distinguished from commit failure. > > So it's better not to do that hook after commit. In my opinion, we may > need a tool to validate the snapshot, says check > whether the files of snapshot is complete, if not complete then we can > also use the tool to clean the orphan files. > > Thanks. > > On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com> > wrote: > >> Hi, >> >> While using feature of Expire/RemoveSnapshots, I saw that the clean up >> operation of files happens, after successful commit [1] of snapshot list >> and update to `version-hint.text`, which means that in case of >> intermittent/IOException from underlying store, we might end up leaving the >> files on disk, without any reference in table's latest version/snapshot >> list. Is there an API to clean that up, after the snapshots are gone from >> history? >> >> I raised a issue for this >> https://github.com/apache/incubator-iceberg/issues/822 >> Let me know, if I am missing something. >> >> [1]: >> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144 >> >> Thanks, >> Ashish >> >> -- Ryan Blue Software Engineer Netflix