@Anton Okolnychyi <aokolnyc...@apple.com> Looking forward to the pull requests :-)
On Wed, Mar 18, 2020 at 3:38 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > As Ryan noted, there are multiple cases when some files can be left orphan > and we won’t be able to avoid all such cases. In our case, we are > periodically listing locations and comparing the actual files to all files > present in metadata tables. > > At some point, there was an agreement in the community to build a set of > actions that are essential for operating and maintaining Iceberg tables. We > can contribute an action to optimize metadata and an action to remove > orphan files. That way, the community will use one codebase for such common > things. > > - Anton > > On 2 Mar 2020, at 15:46, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > Thanks for bringing this up, Ashish. The section of code that you point > out is *mostly* correct, but there is a case that we should fix. The > reason I say “mostly correct” is that it has correct behavior and has error > handling to ensure that as much of the work is done as possible. > Even if not all of the data is cleaned up, the behavior must be to throw > an exception if and only if the metadata commit was successful. If the > commit is successful, then throwing an error puts a client in an > unrecoverable state because it thinks that the table hasn’t changed when it > actually did. If a client retries an operation that was already committed > successfully, it could make incorrect changes to the table. > > To address reliability, cleanup uses Tasks so that a single failure > doesn’t prevent attempting to delete other files > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L281> > and files that can’t be deleted are logged. There is similar failure > handling while finding files to delete, so that read failures don’t > prevent deleting > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L291> > as much data as possible. If you want even more guarantees, then you can > add a custom delete method and persist the files to delete for another > service to handle. > > Now that I’m looking at this, one area that can be improved is the > handling of manifest list files. There is no failure handling > <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L333-L344> > while reading manifest lists, so we should add that. > > I should also point out that this is not the only way for data to leak. > For example, an executor failure in Spark can lead to a data file getting > written by not committed to a table. I recommend having a way to > periodically list table locations and check against the metadata tables, > like all_data_files and all_manifests, to ensure that no files are left > behind. > > On Wed, Feb 26, 2020 at 5:53 AM OpenInx open...@gmail.com > <http://mailto:open...@gmail.com/> wrote: > > > Hi Ashish >> >> It's indeed a bug for my understanding, I read your idea about the >> transaction hook. removing the data & manifest file >> of expired snapshots should happen after writing version-hint file >> (otherwise there will be some readers accessing >> snapshots which we are deleting data files). so the hook should happen >> after writeVersionHint I guess, while the javadoc >> of TableOperations#commit says: Once the atomic commit operation >> succeeds, implementations must not perform >> any operations that may fail because failure in this method cannot be >> distinguished from commit failure. >> >> So it's better not to do that hook after commit. In my opinion, we may >> need a tool to validate the snapshot, says check >> whether the files of snapshot is complete, if not complete then we can >> also use the tool to clean the orphan files. >> >> Thanks. >> >> On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com> >> wrote: >> >>> Hi, >>> >>> While using feature of Expire/RemoveSnapshots, I saw that the clean up >>> operation of files happens, after successful commit [1] of snapshot list >>> and update to `version-hint.text`, which means that in case of >>> intermittent/IOException from underlying store, we might end up leaving the >>> files on disk, without any reference in table's latest version/snapshot >>> list. Is there an API to clean that up, after the snapshots are gone from >>> history? >>> >>> I raised a issue for this >>> https://github.com/apache/incubator-iceberg/issues/822 >>> Let me know, if I am missing something. >>> >>> [1]: >>> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144 >>> >>> Thanks, >>> Ashish >>> >>> > > -- > Ryan Blue > Software Engineer > Netflix > > >