Hey Iceberg Nation, Everyone is welcome to attend syncs. Subscribe to this calendar <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai> to receive a notification. Note: This meeting note is backdated as I forgot to post it here earlier. 2023-08-30 (Meeting Recording <https://www.youtube.com/watch?v=5kfnBYU7SME> ⭕ )
- Highlights - Java: Flink sink adds custom partitioner to better distribute traffic for bucket partitioned tables <https://github.com/apache/iceberg/pull/7161> (Thanks, Sergio!) - Java: AWS, GCP, and Azure bundles (Thanks, Bryan!) - Java: Azure FileIO (Thanks, Bryan!) - Java: Delete file in job planning optimizations (Thanks, Anton!) - Python: Moved to Pydantic v2 (Thanks, Fokko!) - Java: Fixed branches with empty tables (Thanks, ConeyLiu!) - Rust: Merged TableMetadata (including (de)serialization), (Thanks, Jan!) - Go: Schema and types (Thanks, Matt!) - Releases - PyIceberg 0.5.0 <https://lists.apache.org/thread/1oj7hcpp5ccc8tt1rjz7bj8yox4onlnc> - Blockers - Fixing schema evolution (#8374 <https://github.com/apache/iceberg/pull/8374>) - Java 1.4.0 <https://github.com/apache/iceberg/milestone/35> - Blockers - Fixing history with lazy snapshot loading - V2 tables by default - Spark distributed planning - Default to zstd - Timeline: End of next week - Milestone: https://github.com/apache/iceberg/milestone/35 - Discussion - Owned Table Location https://docs.google.com/document/d/1pTJPQaHwyO0NFlLcHIrXq4gBazJmAyPnigmOPMbBRR0/edit?usp=sharing (Szehon) - Multi-arg transform: https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing (advancedxy) - Iceberg Table Portability - Link <https://docs.google.com/document/d/1q-pWhI8A8_T_zbxKoZ7Lg02ybz87kHrNP9zbm9p0RQk/edit#heading=h.2zb2jb8vclvm> - Nanosecond timestamp & timestamptz - sufficient consensus, next steps? AI-generated chapter summaries: 0:00 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=0s> Chapter 1 The team discussed various updates and improvements, including the addition of a custom partitioner for better data balancing, the creation of bundles for easier use of cloud services, optimizations in the delete file job planning path, and progress in the Rust and Go communities. 10:11 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=611s> Chapter 2 The team discussed various issues and updates related to the Java 1.4 release, including the progress on resolving dependencies, the appointment of Anton as the release manager, and the inclusion of features like distributed planning in Spark and the use of C standard for new tables. They also discussed the need for fixing lazy snapshot loading and the challenges of table location ownership to prevent accidental destruction of tables. 21:02 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=1262s> Chapter 3 The team discussed the issue of orphaned files and the challenges of table and location ownership. They explored different modes and approaches to address the problem, including checking for location existence, assigning table locations based on table names, and implementing a global orphan file cleanup at the administrator level. 29:58 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=1798s> Chapter 4 The discussion revolved around the ownership and sharing of table locations in a catalog. They considered different approaches and debated whether to use table properties or a separate list of owned locations in the table metadata bundle. 39:12 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=2352s> Chapter 5 Anton, Xianqing, and others discussed the proposal to include multi-argument transformers in the documentation. They explored the challenges and benefits of this change, including the need for expression API modifications and the possibility of adding additional information for normalization values in transforms. 49:44 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=2984s> Chapter 6 - Anton discussed the possibility of supporting additional files and range partitioning for better performance. The team agreed to consider implementing custom transforms and table portability in future discussions.