Hey Iceberg Nation, Everyone is welcome to attend syncs. Subscribe to this calendar <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai> to receive a notification. Note: This meeting note is backdated as I forgot to post it here earlier.
2023-10-11(Meeting Recording <https://youtu.be/euWtAKo_bV4> ⭕ ) - Highlights - 1.4.0 was released! (Thanks, Anton!) - v2 and zstd defaults - Advisory partition size in Spark - Skip local sort for unordered writes in Spark - Distributed planning in Spark - AzureFileIO - Multi-table commits through REST - Removed Spark 3.1 - Python moved to the iceberg-python <https://github.com/apache/iceberg-python> repo (removed from main) - Flink alter table column support was added <https://github.com/apache/iceberg/pull/7628> (1.17 only), like adding a new column, changing column position (Thanks, Yanghao Lin) - Metastore catalog support for views was added (Thanks, Eduard!) - Close to write support in Python, supports v1 and v2 metadata (Thanks, Fokko!) - Rust added read support for manifest lists (Thanks, ZENOTME) - Spark: clean up FileIO resources on executors (Thanks, Anton!) - Discussion - PR commit methods – standardize on squash? - Iceberg docs refactor <https://github.com/apache/iceberg/pull/8659> (try me <https://github.com/bitsondatadev/iceberg/tree/new-docs/docs-new>) - Spec v3 changes: - New types - BLOB - BSON/JSON - Timestamp{tz}_{ns,ms} <https://docs.google.com/document/d/1bE1DcEGNzZAMiVJSZ0X1wElKLNkT9kRkk0hDlfkXzvU/edit> (not millis) - FLOAT16? - Default values - Type promotion - * to string (choose a format) - What are the use cases for changing the type? - int/long to string - float/timestamp - why? - Bool to string should be allowed - Long to timestamp (must be millis) - Multi-column transforms - Bucket v2 - Geo? - Location/path requirements (recommendations) - Owned locations (discussion <https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r>) - Delete vectors (discussion <https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g>) - Allowing relative paths - Partition stats spec and discussion in PR 7105 <https://github.com/apache/iceberg/pull/7105>. Kafka Connect (discussion <https://lists.apache.org/thread/d9h22z2ydcpvjxp53yl6w96xoy3dp33h>) AI-generated chapter summaries: 0:00 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=0s> Chapter 0 Introduction 5:14 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=314s> Chapter 1 Highlights Ryan thanks Anton for releasing v1.4 with many bug fixes and changes, including defaulting to v2 format and Z standard for data compression. Azure file IO is now available, with native support for multi-table commits in Spark. pyIceberg project moved to a new repository and new Python support was added. 12:01 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=721s> Chapter 2 PR commit methods and repository setup. Anton highlights recent improvements in Spark, including file cleanup and manifest file read support and plans to discuss spec v3 changes with the community. The group discusses PR commit methods, suggesting standardizing across repositories to use squash and merge by default, rather than merge commits. There was concern about enforcing linear history on the Java side, citing potential issues with rebase and time zones. One suggestion was bringing the issue of inconsistent commit messages to the community for resolution. A consensus is built around squashing commits to make them more meaningful and easier to understand. 19:30 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1170s> Chapter 3 Improving Iceberg Docs with a mono repo. Brian is refactoring the iceberg documentation to move it back into the main iceberg repo, simplifying maintenance and improving collaboration. He proposes to create a single documentation site containing the static site and for all versions of docs, solving problems with multiple sources and making releases easier. The plan is to merge an initial PR and build consensus, then replace the current ASF documentation branch and repoint it back to the main repo. We're creating a nightly branch for documentation changes, and maintaining it as an up-to-date snapshot. The readme file on the branch will have all the necessary information for building and understanding the project. 26:59 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1619s> Chapter 4 V3 spec changes for data storage. The team discusses v3 spec changes, including partition stats, which may not be included in v3 due to a lack of need for backward compatibility. If partition stats are required for v3, it would need to be decided and implemented separately from the main v3 discussion. Everyone should be aware that multi-column transforms are a v3-only change and are likely to break in v2. There are also some potential forward-breaking changes for Hadoop v3, including location path requirements and Delete vector proposal. 34:39 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=2079s> Chapter 5 Metadata requirements for Iceberg V3