Hi Ashish

It's indeed a bug for my understanding, I read your idea about the
transaction hook.  removing the data & manifest file
of expired snapshots should happen after writing version-hint file
(otherwise there will be some readers accessing
snapshots which we are deleting data files). so the hook should happen
after writeVersionHint I guess, while the javadoc
of TableOperations#commit says: Once the atomic commit operation succeeds,
implementations must not perform
any operations that may fail because failure in this method cannot be
distinguished from commit failure.

So it's better not to do that hook after commit. In my opinion, we may need
a tool to validate the snapshot, says check
whether the files of snapshot is complete, if not complete then we can also
use the tool to clean the orphan files.

Thanks.

On Wed, Feb 26, 2020 at 4:30 AM Ashish Mehta <mehta.ashis...@gmail.com>
wrote:

> Hi,
>
> While using feature of Expire/RemoveSnapshots, I saw that the clean up
> operation of files happens, after successful commit [1] of snapshot list
> and update to `version-hint.text`, which means that in case of
> intermittent/IOException from underlying store, we might end up leaving the
> files on disk, without any reference in table's latest version/snapshot
> list. Is there an API to clean that up, after the snapshots are gone from
> history?
>
> I raised a issue for this
> https://github.com/apache/incubator-iceberg/issues/822
> Let me know, if I am missing something.
>
> [1]:
> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L144
>
> Thanks,
> Ashish
>
>

Reply via email to