Thanks for the meeting recoding! The "AI-generated chapter summaries" don't seem very readable. Can this be improved?
On Fri, Oct 27, 2023, at 05:25, Brian Olsen wrote: > Hey Iceberg Nation, > Everyone is welcome to attend syncs. Subscribe to this calendar > <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai> > to receive a notification. Note: This meeting note is backdated as I forgot > to post it here earlier. > > 2023-10-11(Meeting Recording <https://youtu.be/euWtAKo_bV4> ⭕ ) > > • Highlights > > • 1.4.0 was released! (Thanks, Anton!) > > • v2 and zstd defaults > > • Advisory partition size in Spark > > • Skip local sort for unordered writes in Spark > > • Distributed planning in Spark > > • AzureFileIO > > • Multi-table commits through REST > > • Removed Spark 3.1 > > • Python moved to the iceberg-python > <https://github.com/apache/iceberg-python> repo (removed from main) > > • Flink alter table column support was added > <https://github.com/apache/iceberg/pull/7628> (1.17 only), like adding a new > column, changing column position (Thanks, Yanghao Lin) > > • Metastore catalog support for views was added (Thanks, Eduard!) > > • Close to write support in Python, supports v1 and v2 metadata (Thanks, > Fokko!) > > • Rust added read support for manifest lists (Thanks, ZENOTME) > > • Spark: clean up FileIO resources on executors (Thanks, Anton!) > > • Discussion > > • PR commit methods – standardize on squash? > > • Iceberg docs refactor <https://github.com/apache/iceberg/pull/8659> (try > me <https://github.com/bitsondatadev/iceberg/tree/new-docs/docs-new>) > > • Spec v3 changes: > > • New types > > • BLOB > > • BSON/JSON > > • Timestamp{tz}_{ns,ms} > <https://docs.google.com/document/d/1bE1DcEGNzZAMiVJSZ0X1wElKLNkT9kRkk0hDlfkXzvU/edit> > (not millis) > > • FLOAT16? > > • Default values > > • Type promotion > > • * to string (choose a format) > > • What are the use cases for changing the type? > > • int/long to string > > • float/timestamp - why? > > • Bool to string should be allowed > > • Long to timestamp (must be millis) > > • Multi-column transforms > > • Bucket v2 > > • Geo? > > • Location/path requirements (recommendations) > > • Owned locations (discussion > <https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r>) > > • Delete vectors (discussion > <https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g>) > > • Allowing relative paths > > • Partition stats spec and discussion in PR 7105 > <https://github.com/apache/iceberg/pull/7105>. > > Kafka Connect (discussion > <https://lists.apache.org/thread/d9h22z2ydcpvjxp53yl6w96xoy3dp33h>) > > > AI-generated chapter summaries: 0:00 > <https://www.youtube.com/watch?v=euWtAKo_bV4&t=0s> Chapter 0 Introduction > 5:14 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=314s> Chapter 1 > Highlights Ryan thanks Anton for releasing v1.4 with many bug fixes and > changes, including defaulting to v2 format and Z standard for data > compression. Azure file IO is now available, with native support for > multi-table commits in Spark. pyIceberg project moved to a new repository and > new Python support was added. 12:01 > <https://www.youtube.com/watch?v=euWtAKo_bV4&t=721s> Chapter 2 PR commit > methods and repository setup. Anton highlights recent improvements in Spark, > including file cleanup and manifest file read support and plans to discuss > spec v3 changes with the community. The group discusses PR commit methods, > suggesting standardizing across repositories to use squash and merge by > default, rather than merge commits. There was concern about enforcing linear > history on the Java side, citing potential issues with rebase and time zones. > One suggestion was bringing the issue of inconsistent commit messages to the > community for resolution. A consensus is built around squashing commits to > make them more meaningful and easier to understand. 19:30 > <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1170s> Chapter 3 Improving > Iceberg Docs with a mono repo. Brian is refactoring the iceberg documentation > to move it back into the main iceberg repo, simplifying maintenance and > improving collaboration. He proposes to create a single documentation site > containing the static site and for all versions of docs, solving problems > with multiple sources and making releases easier. The plan is to merge an > initial PR and build consensus, then replace the current ASF documentation > branch and repoint it back to the main repo. We're creating a nightly branch > for documentation changes, and maintaining it as an up-to-date snapshot. The > readme file on the branch will have all the necessary information for > building and understanding the project. 26:59 > <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1619s> Chapter 4 V3 spec > changes for data storage. The team discusses v3 spec changes, including > partition stats, which may not be included in v3 due to a lack of need for > backward compatibility. If partition stats are required for v3, it would need > to be decided and implemented separately from the main v3 discussion. > Everyone should be aware that multi-column transforms are a v3-only change > and are likely to break in v2. There are also some potential forward-breaking > changes for Hadoop v3, including location path requirements and Delete vector > proposal. 34:39 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=2079s> Chapter > 5 Metadata requirements for Iceberg V3 Xuanwo