+1. Anton made a good case with the new perspective. On Thu, May 25, 2023 at 2:29 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote:
> Oh, I missed the earlier discussion. Thanks for sharing it, Gabor! > > I am approaching this from a slightly different perspective. Defaulting to > v2 does not mean supporting delete files. My primary concern is that our > default behavior may be either confusing or inefficient. For instance, > using always null transforms in v1 spec evolution is hard to explain to > users. Not enabling snapshot ID inheritance means rewriting manifests in > huge tables can take hours. Managed catalogs or teams that run forks have > more control over tables and can make better choices but I also worry about > folks that just start with Iceberg and use built-in catalogs. > > Can we think of potential issues with having a v2 table with no delete > files vs a v1 table? > > - Anton > > On May 24, 2023, at 10:43 PM, Szehon Ho <szehon.apa...@gmail.com> wrote: > > Hi, > > I'm +1 to making v2 the default, say after this release. > > It seems most of the features brought up as concerns on Spark side in the > thread Gabor linked have been implemented (like position delete lifecycle). > > But Anton's point is also good. Even if some delete file features are > missing, V2 is not only about delete files, which are not produced by > default in Spark, and Flink(?), but rather the fixes for partition spec > evolution / snapshot id inheritance. Hence it makes sense to me, from that > angle. > > Thanks > Szehon > > On Wed, May 24, 2023 at 12:34 AM Gabor Kaszab < > gaborkas...@cloudera.com.invalid> wrote: > >> Hey Anton, >> >> Just adding a note that back around January the same topic was brought up >> on this mail list. There the conclusion was to use the 'table-default.' >> catalog level property to create V2 tables by default. >> https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf >> >> I'm not saying that we shouldn't default to V2 just drawing attention to >> this previous conversation. >> >> Cheers, >> Gabor >> >> On Wed, May 24, 2023 at 12:04 AM Anton Okolnychyi < >> aokolnyc...@apple.com.invalid> wrote: >> >>> Hi folks, >>> >>> Would it be appropriate for us to consider changing the default table >>> format version for new tables from v1 to v2? >>> >>> I don’t think defaulting to v2 tables means all readers have to support >>> delete files. DELETE, UPDATE, MERGE operations will only produce delete >>> files if configured explicitly. >>> >>> The primary reason I am starting this thread is to avoid our workarounds >>> in v1 spec evolution, and snapshot ID inheritance. The latter is critical >>> for the performance of rewriting manifests. >>> >>> Any thoughts? >>> >>> - Anton >> >> >