In the process of following up on the "Updates/Deletes/Upserts" thread, I'm
re-reading the table spec. I have a question about Manifest List files.

If I understand correctly, the manifest list files are separate files that
are created prior to attempting to commit a new snapshot. Each snapshot may
have a single manifest list file. The manifest list file references _all_
manifest files included in the snapshot.

During a commit collision, two writers will produce new manifest list
files. Assuming the two writes are compatible (one is append, one is
replace, for example) the loser should be able to re-process their commit
without rewriting any data files but will, nonetheless, need to rewrite
their manifest list file in addition to rewriting their snapshot file.

I was under the impression that it was a design objective to minimize the
amount of work required in order to retry a commit. The inability to
compose multiple manifest list files together seems like it adds mandatory
read and write steps to almost every commit collision.

Can someone clarify what the philosophy is with regards to minimizing the
cost of commit retries?

Thanks!

-Erik

Reply via email to