Hey Iceberg Nation, Everyone is welcome to attend syncs. Subscribe to this calendar <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai> to receive a notification. Note: This meeting note is backdated as I forgot to post it here earlier.
2023-08-09 (Meeting Recording <https://youtu.be/N0X1lkjkRUM> ⭕ ) - Highlights - Gradle Version catalog support <https://github.com/apache/iceberg/pull/7694> was added (Thanks, Max!) - Pushing down system functions by V2 filters <https://github.com/apache/iceberg/pull/7886> (Thanks, Coney!) - Parquet: Cache codecs by name and level <https://github.com/apache/iceberg/pull/8182> (Thanks, Bryan!) - Adaptive split sizing in Spark <https://github.com/apache/iceberg/pull/7714> (Thanks, Anton!) - Optimizations to the DeleteFileIndex <https://github.com/apache/iceberg/pull/8157> (Thanks, Anton!) - Python: add pyarrow hdfs support <https://github.com/apache/iceberg/pull/7997> (Thanks, Luigi!) - Display Spark read metrics in Spark UI <https://github.com/apache/iceberg/pull/7447> (Thanks, Karuppayya!) - Support creating branch on an empty table <https://github.com/apache/iceberg/pull/8072> (Thanks, Coney!) - Awesome progress on the Rust implementation <https://github.com/apache/iceberg-rust/pulls> (Thanks, Jan, Renjie, Xuanwo) - Releases - Update on the Iceberg 1.4.0 release - Let’s further discuss changing to v2 as the default in this release (Anton) - Discussion - Discuss the location of client projects (rust, go, python). (Brian) - Documentation proposal. (Thread <https://lists.apache.org/thread/nq283g83p9cgx518jwo7dg85fm5gjrvv>, Doc <https://docs.google.com/document/d/1WJXzcwC6isfoywcLY2lZ9gZN6i0JU1I2SIZmCkumbZc/edit#heading=h.gli9mc2ghfz1>) (Brian) - https://github.com/apache/iceberg/issues/3547 (Anton) - V2 format as default for new tables (Anton) - Delete planning and column stats (Anton) - https://github.com/apache/iceberg/pull/8158 (Anton) - Write Avro map/list block sizes (Rusty) - Publishing Python Wheels AI-generated chapter summaries: 0:00 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=0s> Chapter 1 Brian, Daniel, Eduard, and Anton discussed various highlights and improvements in their work, including the transition to a full-featured version catalog, fixing issues with Parquet caching, adding support for Spark Readmetrics in the Spark UI, optimizing delete file index, and implementing adaptive split size in Spark 6:40 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=400s> Chapter 2 Anton, Daniel, Fokko, and Brian discussed various updates and improvements to the Spark framework, including split planning, ACFS support, performance enhancements, and integration with Bolar and GCP. They also mentioned ongoing work and collaborations to further enhance the ecosystem. 11:43 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=703s> Chapter 3 The team discussed the progress and blockers for the upcoming release (1.4). They mentioned various features and improvements that were being reviewed and targeted for inclusion, such as distributed planning in Spark, view implementations, and multi-table commit. 17:22 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=1042s> Chapter 4 Brian, Anton, Daniel, and Fokko discussed whether to keep Rust separate or merge it with the main repository. They considered the benefits of consistency in code structure and documentation, as well as the potential challenges of versioning and CI-CD complexity. 23:46 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=1426s> Chapter 5 Brian, Fokko, Anton, and others discussed the pros and cons of using a monorepo for documentation and code. They considered the complexity of managing multiple versions, the visibility of changes, and the need for a unified approach. Brian proposed a solution involving a binary file to hide versioned documentation, and they agreed to experiment and gather more input before making a final decision. 33:59 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=2039s> Chapter 6 The team discussed the need to unify the Apache org site and the DOCK site, as well as the challenges of keeping them consistent. They also explored the possibility of adding a feature to specify the iceberg sort order in the table creation statement in Spark. 44:59 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=2699s> Chapter 7 Daniel, Anton, and Bryan discussed inefficiencies in the delete planning and execution process, as well as the possibility of using Z-standard by default for new tables. They shared their findings and ideas for improvement, with the intention of collaborating and finding solutions to enhance performance. 50:14 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=3014s> Chapter 8 Daniel, Anton, Brian, and Fokko discussed various topics including the support for Z-standard in different engines, improving the speed of reading manifest files in Python, and implementing a siphon-based Avro deserializer. They also mentioned the need to use a different class in the Avro jar to optimize the Python side and the implications of publishing Python binary wheels. 55:21 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=3321s> Chapter 9 The team discussed the implementation of skipping and deserialization in the Java and Python read paths. They also talked about the concerns and options regarding packaging the code as wheels or using a Python C compiler.