@Anton Okolnychyi <aokolnyc...@apple.com>   Looking forward to the pull
requests :-)

On Wed, Mar 18, 2020 at 3:38 AM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> As Ryan noted, there are multiple cases when some files can be left orphan
> and we won’t be able to avoid all such cases. In our case, we are
> periodically listing locations and comparing the actual files to all files
> present in metadata tables.
>
> At some point, there was an agreement in the community to build a set of
> actions that are essential for operating and maintaining Iceberg tables. We
> can contribute an action to optimize metadata and an action to remove
> orphan files. That way, the community will use one codebase for such common
> things.
>
> - Anton
>
> On 2 Mar 2020, at 15:46, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>
> Thanks for bringing this up, Ashish. The section of code that you point
> out is *mostly* correct, but there is a case that we should fix. The
> reason I say “mostly correct” is that it has correct behavior and has error
> handling to ensure that as much of the work is done as possible.
> Even if not all of the data is cleaned up, the behavior must be to throw
> an exception if and only if the metadata commit was successful. If the
> commit is successful, then throwing an error puts a client in an
> unrecoverable state because it thinks that the table hasn’t changed when it
> actually did. If a client retries an operation that was already committed
> successfully, it could make incorrect changes to the table.
>
> To address reliability, cleanup uses Tasks so that a single failure
> doesn’t prevent attempting to delete other files
> <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L281>
> and files that can’t be deleted are logged. There is similar failure
> handling while finding files to delete, so that read failures don’t
> prevent deleting
> <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L291>
> as much data as possible. If you want even more guarantees, then you can
> add a custom delete method and persist the files to delete for another
> service to handle.
>
> Now that I’m looking at this, one area that can be improved is the
> handling of manifest list files. There is no failure handling
> <https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L333-L344>
> while reading manifest lists, so we should add that.
>
> I should also point out that this is not the only way for data to leak.
> For example, an executor failure in Spark can lead to a data file getting
> written by not committed to a table. I recommend having a way to
> periodically list table locations and check against the metadata tables,
> like all_data_files and all_manifests, to ensure that no files are left
> behind.
>
> On Wed, Feb 26, 2020 at 5:53 AM OpenInx open...@gmail.com
> <http://mailto:open...@gmail.com/> wrote:
>
>
> Hi Ashish
>>
>> It's indeed a bug for my understanding, I read your idea about the
>> transaction hook.  removing the data & manifest file
>> of expired snapshots should happen after writing version-hint file
>> (otherwise there will be some readers accessing
>> snapshots which we are deleting data files). so the hook should happen
>> after writeVersionHint I guess, while the javadoc
>> of TableOperations#commit says: Once the atomic commit operation
>> succeeds, implementations must not perform
>> any operations that may fail because failure in this method cannot be
>> distinguished from commit failure.
>>
>> So it's better not to do that hook after commit. In my opinion, we may
>> need a tool to validate the snapshot, says check
>> whether the files of snapshot is complete, if not complete then we can
>> also use the tool to clean the orphan files.
>>
>> Thanks.
>>
>> On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> While using feature of Expire/RemoveSnapshots, I saw that the clean up
>>> operation of files happens, after successful commit [1] of snapshot list
>>> and update to `version-hint.text`, which means that in case of
>>> intermittent/IOException from underlying store, we might end up leaving the
>>> files on disk, without any reference in table's latest version/snapshot
>>> list. Is there an API to clean that up, after the snapshots are gone from
>>> history?
>>>
>>> I raised a issue for this
>>> https://github.com/apache/incubator-iceberg/issues/822
>>> Let me know, if I am missing something.
>>>
>>> [1]:
>>> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144
>>>
>>> Thanks,
>>> Ashish
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>

Reply via email to