Hey Iceberg Nation! Here are the minutes and recording from our Iceberg Sync. As a reminder, anyone can join the discussion so feel free to share the Iceberg-Sync.
*NOTE:* Due to technical difficulties of folks not receiving invitations from the Iceberg Sync Google Group <https://groups.google.com/g/iceberg-sync> we will move to sharing the link through a public calendar posted on the Apace Iceberg site < https://iceberg.apache.org/community/#iceberg-community-events>. Please test that you can find the longstanding Google meeting on the calendar before the next meeting. I'll be making this a separate thread on the dev list to discuss this or pose any questions or concerns there. The notes and the agenda are posted in the Iceberg Sync YouTube description, as well as, maintained in the meeting minute notes < https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web> which is linked on the calendar invite. Meeting Recording <https://www.youtube.com/watch?v=DhvRg0oEdwU> ⭕ / Meeting Transcript, can be found here in the video < https://youtu.be/1lm4Wlpy2wU?t=28> - Highlights - OOM fix caused by Avro decoder caching <https://github.com/apache/iceberg/pull/7791> (Coney Liu) - Multiple shuffle partitions per file <https://github.com/apache/iceberg/pull/7897> (Anton) - Python: Add positional deletes <https://github.com/apache/iceberg/pull/6775> (Fokko) - Python: alter table in transactions <https://github.com/apache/iceberg/pull/6323> (Fokko) - View metadata implementation <https://github.com/apache/iceberg/pull/7759> (Eduard) - View support for InMemoryCatalog <https://github.com/apache/iceberg/pull/7880> (Eduard) - API for multi-table commits <https://github.com/apache/iceberg/pull/7569> (Eduard) - Catalog Transaction API is still in draft <https://github.com/apache/iceberg/pull/6948> - Flink: Split ordering based on Sequence Number <https://github.com/apache/iceberg/pull/7661> (Peter) - Adding new Iceberg Events Calendar <https://calendar.google.com/calendar/u/1?cid=NTkzYmIwMGJmZTQ1N2QzMTkxNDEzNTBkZDI0Yzk2NGYzOWJkYmQ5ZmQyNDMyODFhODYzMmEwMDk2M2EyMWQ4NkBncm91cC5jYWxlbmRhci5nb29nbGUuY29t> including this sync. (Brian) - Any PMC that wants access to edit this calendar reach out to me on Slack/Dev list. Will announce to dev list. - - Releases - Python 0.4 vote is out! (Please verify) - Discussion - bloom/cuckoo/other filters in manifest - Priority-based commit - Documentation efforts #documentation <https://slack.com/app_redirect?channel=C05BXHPEGTA&team=T025T8UC953> (Brian) - Proposed updates to the docs site <https://docs.google.com/document/d/1Y_PRv6p5oJaxg_68AUia_JHw8P4-AZIu3hP5IH2Cpsw/edit> AI-generated chapter summaries: - 0:00 Chapter 1 Daniel, Brian, and the team discussed various updates and progress made in the past few weeks, including fixes for memory issues, improvements in shuffle partitions for file compaction, positional delete support in Python, and the implementation of view metadata. They also mentioned plans for engine integrations and the need for further discussions on multi-table transactions. - 10:29 Chapter 2 The team discussed the progress and next steps for implementing multi-table commits and improving support for ordered read from iceberg tables. They also discussed plans to make the Iceberg community meetings more accessible by providing a public link instead of requiring a subscription to the dev list. - 21:03 Chapter 3 The team discussed the new features and improvements included in the ODOP release, such as positional leads, SQL style filters, and performance enhancements. They also explored the possibility of adding Bloom filters to the manifest file to improve point query performance - 31:54 Chapter 4 The discussion revolved around the potential use of Parquet Bloom filters to improve performance and reduce costs. They also discussed the possibility of introducing priority-based commit and the challenges associated with it. - 42:00 Chapter 5 The team discussed the possibility of implementing a rollback feature in the catalog and snapshot producer. They also talked about prioritizing documentation updates and planning to add examples and tutorials in the future. Thanks! See you all at the next sync!