As Ryan noted, there are multiple cases when some files can be left orphan and we won’t be able to avoid all such cases. In our case, we are periodically listing locations and comparing the actual files to all files present in metadata tables.
At some point, there was an agreement in the community to build a set of actions that are essential for operating and maintaining Iceberg tables. We can contribute an action to optimize metadata and an action to remove orphan files. That way, the community will use one codebase for such common things. - Anton > On 2 Mar 2020, at 15:46, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > Thanks for bringing this up, Ashish. The section of code that you point out > is mostly correct, but there is a case that we should fix. The reason I say > “mostly correct” is that it has correct behavior and has error handling to > ensure that as much of the work is done as possible. > Even if not all of the data is cleaned up, the behavior must be to throw an > exception if and only if the metadata commit was successful. If the commit is > successful, then throwing an error puts a client in an unrecoverable state > because it thinks that the table hasn’t changed when it actually did. If a > client retries an operation that was already committed successfully, it could > make incorrect changes to the table. > > To address reliability, cleanup uses Tasks so that a single failure doesn’t > prevent attempting to delete other files > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L281> > and files that can’t be deleted are logged. There is similar failure > handling while finding files to delete, so that read failures don’t prevent > deleting > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L291> > as much data as possible. If you want even more guarantees, then you can add > a custom delete method and persist the files to delete for another service to > handle. > > Now that I’m looking at this, one area that can be improved is the handling > of manifest list files. There is no failure handling > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L333-L344> > while reading manifest lists, so we should add that. > > I should also point out that this is not the only way for data to leak. For > example, an executor failure in Spark can lead to a data file getting written > by not committed to a table. I recommend having a way to periodically list > table locations and check against the metadata tables, like all_data_files > and all_manifests, to ensure that no files are left behind. > > On Wed, Feb 26, 2020 at 5:53 AM OpenInx open...@gmail.com > <http://mailto:open...@gmail.com/> wrote: > > > > Hi Ashish > > It's indeed a bug for my understanding, I read your idea about the > transaction hook. removing the data & manifest file > of expired snapshots should happen after writing version-hint file (otherwise > there will be some readers accessing > snapshots which we are deleting data files). so the hook should happen after > writeVersionHint I guess, while the javadoc > of TableOperations#commit says: Once the atomic commit operation succeeds, > implementations must not perform > any operations that may fail because failure in this method cannot be > distinguished from commit failure. > > So it's better not to do that hook after commit. In my opinion, we may need a > tool to validate the snapshot, says check > whether the files of snapshot is complete, if not complete then we can also > use the tool to clean the orphan files. > > Thanks. > > On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com > <mailto:mehta.ashis...@gmail.com>> wrote: > Hi, > > While using feature of Expire/RemoveSnapshots, I saw that the clean up > operation of files happens, after successful commit [1] of snapshot list and > update to `version-hint.text`, which means that in case of > intermittent/IOException from underlying store, we might end up leaving the > files on disk, without any reference in table's latest version/snapshot list. > Is there an API to clean that up, after the snapshots are gone from history? > > I raised a issue for this > https://github.com/apache/incubator-iceberg/issues/822 > <https://github.com/apache/incubator-iceberg/issues/822> > Let me know, if I am missing something. > > [1]: > https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144 > > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144> > > Thanks, > Ashish > > > > -- > Ryan Blue > Software Engineer > Netflix