Hi community,

Table prop "compatibility.snapshot-id-inheritance.enabled" is introduced to
avoid manifest rewrite if possible (PR:
https://github.com/apache/iceberg/commit/c3dc9824b381e5e479e356be5e0f4fcf61a9fc37
).

During my recent investigation on a super long snapshot table creation on a
huge table, I found that the majority of time spent is on manifest rewrite
during appendManifest operation (code link:
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L279)
due to this table prop being default as False.

Russell brought a point of considering setting this table prop to True and
suggested I start a discussion on the dev list.

Correct me if I am wrong, after looking at the code, my understanding of
the implications are:
1. There will be manifests not having snapshot id in some cases. For
example, during snapshot table creation, we append manifest files without
snapshot id to a table.
2. The manifest file name will be the name specified during the "first
write" (the "second write" is manifest copy during appendManifest
operation). An example will be "stage-%d-task-%d-manifest-%s" which is the
name used during snapshot creation, but since the last param is UUID, it
should be fine.

Would like to hear from you, thanks!

Reply via email to