+1. Anton made a good case with the new perspective.

On Thu, May 25, 2023 at 2:29 PM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> Oh, I missed the earlier discussion. Thanks for sharing it, Gabor!
>
> I am approaching this from a slightly different perspective. Defaulting to
> v2 does not mean supporting delete files. My primary concern is that our
> default behavior may be either confusing or inefficient. For instance,
> using always null transforms in v1 spec evolution is hard to explain to
> users. Not enabling snapshot ID inheritance means rewriting manifests in
> huge tables can take hours. Managed catalogs or teams that run forks have
> more control over tables and can make better choices but I also worry about
> folks that just start with Iceberg and use built-in catalogs.
>
> Can we think of potential issues with having a v2 table with no delete
> files vs a v1 table?
>
> - Anton
>
> On May 24, 2023, at 10:43 PM, Szehon Ho <szehon.apa...@gmail.com> wrote:
>
> Hi,
>
> I'm +1 to making v2 the default, say after this release.
>
> It seems most of the features brought up as concerns on Spark side in the
> thread Gabor linked have been implemented (like position delete lifecycle).
>
> But Anton's point is also good.  Even if some delete file features are
> missing, V2 is not only about delete files, which are not produced by
> default in Spark, and Flink(?), but rather the fixes for partition spec
> evolution / snapshot id inheritance.  Hence it makes sense to me, from that
> angle.
>
> Thanks
> Szehon
>
> On Wed, May 24, 2023 at 12:34 AM Gabor Kaszab <
> gaborkas...@cloudera.com.invalid> wrote:
>
>> Hey Anton,
>>
>> Just adding a note that back around January the same topic was brought up
>> on this mail list. There the conclusion was to use the 'table-default.'
>> catalog level property to create V2 tables by default.
>> https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf
>>
>> I'm not saying that we shouldn't default to V2 just drawing attention to
>> this previous conversation.
>>
>> Cheers,
>> Gabor
>>
>> On Wed, May 24, 2023 at 12:04 AM Anton Okolnychyi <
>> aokolnyc...@apple.com.invalid> wrote:
>>
>>> Hi folks,
>>>
>>> Would it be appropriate for us to consider changing the default table
>>> format version for new tables from v1 to v2?
>>>
>>> I don’t think defaulting to v2 tables means all readers have to support
>>> delete files. DELETE, UPDATE, MERGE operations will only produce delete
>>> files if configured explicitly.
>>>
>>> The primary reason I am starting this thread is to avoid our workarounds
>>> in v1 spec evolution, and snapshot ID inheritance. The latter is critical
>>> for the performance of rewriting manifests.
>>>
>>> Any thoughts?
>>>
>>> - Anton
>>
>>
>

Reply via email to