Since the last time we discussed this, we've also updated our default
version to v2. I definitely like the idea we settled on last time, that
this is an administrator setting and it can be controlled already by
catalog deployments. However, I'm coming around on updating the library
default to v2.

My rationale is that we want people that are setting up Iceberg data
platforms (administrator roles) to be as successful as possible without
knowing all the internal details. While you _can_ set this at the catalog
level, those new platform administrators don't know to do that. So I'd
probably opt to make this v2 now.

Ryan

On Thu, May 25, 2023 at 2:51 PM Steven Wu <stevenz...@gmail.com> wrote:

> +1. Anton made a good case with the new perspective.
>
> On Thu, May 25, 2023 at 2:29 PM Anton Okolnychyi
> <aokolnyc...@apple.com.invalid> wrote:
>
>> Oh, I missed the earlier discussion. Thanks for sharing it, Gabor!
>>
>> I am approaching this from a slightly different perspective. Defaulting
>> to v2 does not mean supporting delete files. My primary concern is that our
>> default behavior may be either confusing or inefficient. For instance,
>> using always null transforms in v1 spec evolution is hard to explain to
>> users. Not enabling snapshot ID inheritance means rewriting manifests in
>> huge tables can take hours. Managed catalogs or teams that run forks have
>> more control over tables and can make better choices but I also worry about
>> folks that just start with Iceberg and use built-in catalogs.
>>
>> Can we think of potential issues with having a v2 table with no delete
>> files vs a v1 table?
>>
>> - Anton
>>
>> On May 24, 2023, at 10:43 PM, Szehon Ho <szehon.apa...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm +1 to making v2 the default, say after this release.
>>
>> It seems most of the features brought up as concerns on Spark side in the
>> thread Gabor linked have been implemented (like position delete lifecycle).
>>
>> But Anton's point is also good.  Even if some delete file features are
>> missing, V2 is not only about delete files, which are not produced by
>> default in Spark, and Flink(?), but rather the fixes for partition spec
>> evolution / snapshot id inheritance.  Hence it makes sense to me, from that
>> angle.
>>
>> Thanks
>> Szehon
>>
>> On Wed, May 24, 2023 at 12:34 AM Gabor Kaszab <
>> gaborkas...@cloudera.com.invalid> wrote:
>>
>>> Hey Anton,
>>>
>>> Just adding a note that back around January the same topic was brought
>>> up on this mail list. There the conclusion was to use the 'table-default.'
>>> catalog level property to create V2 tables by default.
>>> https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf
>>>
>>> I'm not saying that we shouldn't default to V2 just drawing attention to
>>> this previous conversation.
>>>
>>> Cheers,
>>> Gabor
>>>
>>> On Wed, May 24, 2023 at 12:04 AM Anton Okolnychyi <
>>> aokolnyc...@apple.com.invalid> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Would it be appropriate for us to consider changing the default table
>>>> format version for new tables from v1 to v2?
>>>>
>>>> I don’t think defaulting to v2 tables means all readers have to support
>>>> delete files. DELETE, UPDATE, MERGE operations will only produce delete
>>>> files if configured explicitly.
>>>>
>>>> The primary reason I am starting this thread is to avoid our
>>>> workarounds in v1 spec evolution, and snapshot ID inheritance. The latter
>>>> is critical for the performance of rewriting manifests.
>>>>
>>>> Any thoughts?
>>>>
>>>> - Anton
>>>
>>>
>>

-- 
Ryan Blue
Tabular

Reply via email to