Hi Arina, thanks for reporting this issue, and for the thorough write-up on that issue!
I suspect that this has something to do with PR #218 <https://github.com/apache/incubator-iceberg/pull/218> that introduced special handling for files that are deleted in transactions. The problem that PR fixed was that a manifest was created, merged, and then deleted. Then the transaction failed to commit and retried. The manifest that was created was reused, but in the retry it didn’t get merged and was still a valid metadata file. Since the file had been deleted on the first try, the table was missing a manifest. The fix was to introduce a lazy delete for cleaning up. The transaction keeps track of files to delete and deletes them after the commit succeeds. What might be happening here is the first time the transaction tries to commit, it is out of date and retries, then the original manifest is not deleted on the second attempt. Looking at the cleanup code, I think this looks like the problem because the filtered manifest cache is cleared as files are deleted: https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L336 I think the fix is to add a list of files that should be deleted on every attempt. When the filtered cache is cleared, each file should be deleted and moved to the delete list. That way future attempts also delete the files. rb On Tue, Jul 30, 2019 at 12:48 PM Arina Yelchiyeva < [email protected]> wrote: > Hi all, > > I have noticed that when performing delete operation in transaction and > there are at least two snapshots prior to delete operation in Iceberg table, > delete operation produces two manifests files where one is orphan. Note, > if delete operation performed not in transaction, everything works fine. > > Orphaned manifest files subsequently are not deleted during snapshots > expiration and keep pilling up. > I have described the issue in more details in > https://github.com/apache/incubator-iceberg/issues/330. > > Maybe someone has an idea why orphan file is created? > > Kind regards, > Arina > -- Ryan Blue Software Engineer Netflix
