It isn't safe for this to be set for any table that may be read by an older
version of Iceberg.

On Thu, Aug 31, 2023 at 9:49 AM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Ray, in what cases we can not set it as True for v1 tables?
>
> My major goal is to reduce the runtime of snapshot creation jobs.
>
> Is it OK if I set it True during snapshot table creation and set it to
> false when finished?
>
> On Thu, Aug 31, 2023 at 9:44 AM Ryan Blue <b...@tabular.io> wrote:
>
>> This isn't something that we can set to `true` because it is a
>> forward-incompatible change. That's why we added a flag.
>>
>> However, we should make sure that this is the default behavior in v2
>> tables, since it is safe for v2 (where inheritance happens automatically).
>> If I remember correctly, we still rewrite by default in v2 even though it's
>> safe.
>>
>> On Thu, Aug 31, 2023 at 9:40 AM Pucheng Yang <py...@pinterest.com.invalid>
>> wrote:
>>
>>> Hi community,
>>>
>>> Table prop "compatibility.snapshot-id-inheritance.enabled" is introduced
>>> to avoid manifest rewrite if possible (PR:
>>> https://github.com/apache/iceberg/commit/c3dc9824b381e5e479e356be5e0f4fcf61a9fc37
>>> ).
>>>
>>> During my recent investigation on a super long snapshot table creation
>>> on a huge table, I found that the majority of time spent is on
>>> manifest rewrite during appendManifest operation (code link:
>>> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L279)
>>> due to this table prop being default as False.
>>>
>>> Russell brought a point of considering setting this table prop to True
>>> and suggested I start a discussion on the dev list.
>>>
>>> Correct me if I am wrong, after looking at the code, my understanding of
>>> the implications are:
>>> 1. There will be manifests not having snapshot id in some cases. For
>>> example, during snapshot table creation, we append manifest files without
>>> snapshot id to a table.
>>> 2. The manifest file name will be the name specified during the "first
>>> write" (the "second write" is manifest copy during appendManifest
>>> operation). An example will be "stage-%d-task-%d-manifest-%s" which is the
>>> name used during snapshot creation, but since the last param is UUID, it
>>> should be fine.
>>>
>>> Would like to hear from you, thanks!
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to