Ryan, thanks for the detailed answer.
I'll try out the suggested approach and post results in the Issue #330.

Kind regards,
Arina

On Wed, Jul 31, 2019 at 12:02 AM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> Hi Arina, thanks for reporting this issue, and for the thorough write-up
> on that issue!
>
> I suspect that this has something to do with PR #218
> <https://github.com/apache/incubator-iceberg/pull/218> that introduced
> special handling for files that are deleted in transactions. The problem
> that PR fixed was that a manifest was created, merged, and then deleted.
> Then the transaction failed to commit and retried. The manifest that was
> created was reused, but in the retry it didn’t get merged and was still a
> valid metadata file. Since the file had been deleted on the first try, the
> table was missing a manifest.
>
> The fix was to introduce a lazy delete for cleaning up. The transaction
> keeps track of files to delete and deletes them after the commit succeeds.
> What might be happening here is the first time the transaction tries to
> commit, it is out of date and retries, then the original manifest is not
> deleted on the second attempt. Looking at the cleanup code, I think this
> looks like the problem because the filtered manifest cache is cleared as
> files are deleted:
> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L336
>
> I think the fix is to add a list of files that should be deleted on every
> attempt. When the filtered cache is cleared, each file should be deleted
> and moved to the delete list. That way future attempts also delete the
> files.
>
> rb
>
> On Tue, Jul 30, 2019 at 12:48 PM Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have noticed that when performing delete operation in transaction and
>> there are at least two snapshots prior to delete operation in Iceberg table,
>> delete operation produces two manifests files where one is orphan. Note,
>> if delete operation performed not in transaction, everything works fine.
>>
>> Orphaned manifest files subsequently are not deleted during snapshots
>> expiration and keep pilling up.
>> I have described the issue in more details in
>> https://github.com/apache/incubator-iceberg/issues/330.
>>
>> Maybe someone has an idea why orphan file is created?
>>
>> Kind regards,
>> Arina
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to