Oh, I missed the earlier discussion. Thanks for sharing it, Gabor!

I am approaching this from a slightly different perspective. Defaulting to v2 
does not mean supporting delete files. My primary concern is that our default 
behavior may be either confusing or inefficient. For instance, using always 
null transforms in v1 spec evolution is hard to explain to users. Not enabling 
snapshot ID inheritance means rewriting manifests in huge tables can take 
hours. Managed catalogs or teams that run forks have more control over tables 
and can make better choices but I also worry about folks that just start with 
Iceberg and use built-in catalogs.

Can we think of potential issues with having a v2 table with no delete files vs 
a v1 table?

- Anton

> On May 24, 2023, at 10:43 PM, Szehon Ho <szehon.apa...@gmail.com> wrote:
> 
> Hi,
> 
> I'm +1 to making v2 the default, say after this release.
> 
> It seems most of the features brought up as concerns on Spark side in the 
> thread Gabor linked have been implemented (like position delete lifecycle).
> 
> But Anton's point is also good.  Even if some delete file features are 
> missing, V2 is not only about delete files, which are not produced by default 
> in Spark, and Flink(?), but rather the fixes for partition spec evolution / 
> snapshot id inheritance.  Hence it makes sense to me, from that angle.
> 
> Thanks
> Szehon
> 
> On Wed, May 24, 2023 at 12:34 AM Gabor Kaszab 
> <gaborkas...@cloudera.com.invalid> wrote:
> Hey Anton,
> 
> Just adding a note that back around January the same topic was brought up on 
> this mail list. There the conclusion was to use the 'table-default.' catalog 
> level property to create V2 tables by default. 
> https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf 
> <https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf>
> 
> I'm not saying that we shouldn't default to V2 just drawing attention to this 
> previous conversation.
> 
> Cheers,
> Gabor
> 
> On Wed, May 24, 2023 at 12:04 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Hi folks,
> 
> Would it be appropriate for us to consider changing the default table format 
> version for new tables from v1 to v2?
> 
> I don’t think defaulting to v2 tables means all readers have to support 
> delete files. DELETE, UPDATE, MERGE operations will only produce delete files 
> if configured explicitly.
> 
> The primary reason I am starting this thread is to avoid our workarounds in 
> v1 spec evolution, and snapshot ID inheritance. The latter is critical for 
> the performance of rewriting manifests.
> 
> Any thoughts?
> 
> - Anton

Reply via email to