Meeting Minutes from 2023-08-09 Iceberg Sync

Brian Olsen Thu, 26 Oct 2023 14:25:48 -0700

Hey Iceberg Nation,
Everyone is welcome to attend syncs. Subscribe to this calendar
<https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai>
to receive a notification. Note: This meeting note is backdated as I forgot
to post it here earlier.



2023-08-09 (Meeting Recording <https://youtu.be/N0X1lkjkRUM> ⭕ )


   -

   Highlights
   -

      Gradle Version catalog support
      <https://github.com/apache/iceberg/pull/7694> was added (Thanks, Max!)
      -

      Pushing down system functions by V2 filters
      <https://github.com/apache/iceberg/pull/7886> (Thanks, Coney!)
      -

      Parquet: Cache codecs by name and level
      <https://github.com/apache/iceberg/pull/8182> (Thanks, Bryan!)
      -

      Adaptive split sizing in Spark
      <https://github.com/apache/iceberg/pull/7714> (Thanks, Anton!)
      -

      Optimizations to the DeleteFileIndex
      <https://github.com/apache/iceberg/pull/8157> (Thanks, Anton!)
      -

      Python: add pyarrow hdfs support
      <https://github.com/apache/iceberg/pull/7997> (Thanks, Luigi!)
      -

      Display Spark read metrics in Spark UI
      <https://github.com/apache/iceberg/pull/7447> (Thanks, Karuppayya!)
      -

      Support creating branch on an empty table
      <https://github.com/apache/iceberg/pull/8072> (Thanks, Coney!)
      -

      Awesome progress on the Rust implementation
      <https://github.com/apache/iceberg-rust/pulls> (Thanks, Jan, Renjie,
      Xuanwo)
      -

   Releases
   -

      Update on the Iceberg 1.4.0 release
      -

         Let’s further discuss changing to v2 as the default in this
         release (Anton)
         -

   Discussion
   -

      Discuss the location of client projects (rust, go, python). (Brian)
      -

      Documentation proposal. (Thread
      <https://lists.apache.org/thread/nq283g83p9cgx518jwo7dg85fm5gjrvv>,
      Doc
      
<https://docs.google.com/document/d/1WJXzcwC6isfoywcLY2lZ9gZN6i0JU1I2SIZmCkumbZc/edit#heading=h.gli9mc2ghfz1>)
      (Brian)
      -

      https://github.com/apache/iceberg/issues/3547 (Anton)
      -

      V2 format as default for new tables (Anton)
      -

      Delete planning and column stats (Anton)
      -

      https://github.com/apache/iceberg/pull/8158 (Anton)
      -

      Write Avro map/list block sizes (Rusty)
      -

      Publishing Python Wheels


AI-generated chapter summaries: 0:00
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=0s> Chapter 1 Brian, Daniel,
Eduard, and Anton discussed various highlights and improvements in their
work, including the transition to a full-featured version catalog, fixing
issues with Parquet caching, adding support for Spark Readmetrics in the
Spark UI, optimizing delete file index, and implementing adaptive split
size in Spark 6:40 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=400s>
Chapter 2 Anton, Daniel, Fokko, and Brian discussed various updates and
improvements to the Spark framework, including split planning, ACFS
support, performance enhancements, and integration with Bolar and GCP. They
also mentioned ongoing work and collaborations to further enhance the
ecosystem. 11:43 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=703s>
Chapter 3 The team discussed the progress and blockers for the upcoming
release (1.4). They mentioned various features and improvements that were
being reviewed and targeted for inclusion, such as distributed planning in
Spark, view implementations, and multi-table commit. 17:22
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=1042s> Chapter 4 Brian,
Anton, Daniel, and Fokko discussed whether to keep Rust separate or merge
it with the main repository. They considered the benefits of consistency in
code structure and documentation, as well as the potential challenges of
versioning and CI-CD complexity. 23:46
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=1426s> Chapter 5 Brian,
Fokko, Anton, and others discussed the pros and cons of using a monorepo
for documentation and code. They considered the complexity of managing
multiple versions, the visibility of changes, and the need for a unified
approach. Brian proposed a solution involving a binary file to hide
versioned documentation, and they agreed to experiment and gather more
input before making a final decision. 33:59
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=2039s> Chapter 6 The team
discussed the need to unify the Apache org site and the DOCK site, as well
as the challenges of keeping them consistent. They also explored the
possibility of adding a feature to specify the iceberg sort order in the
table creation statement in Spark. 44:59
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=2699s> Chapter 7 Daniel,
Anton, and Bryan discussed inefficiencies in the delete planning and
execution process, as well as the possibility of using Z-standard by
default for new tables. They shared their findings and ideas for
improvement, with the intention of collaborating and finding solutions to
enhance performance. 50:14
<https://www.youtube.com/watch?v=N0X1lkjkRUM&t=3014s> Chapter 8 Daniel,
Anton, Brian, and Fokko discussed various topics including the support for
Z-standard in different engines, improving the speed of reading manifest
files in Python, and implementing a siphon-based Avro deserializer. They
also mentioned the need to use a different class in the Avro jar to
optimize the Python side and the implications of publishing Python binary
wheels. 55:21 <https://www.youtube.com/watch?v=N0X1lkjkRUM&t=3321s> Chapter
9 The team discussed the implementation of skipping and deserialization in
the Java and Python read paths. They also talked about the concerns and
options regarding packaging the code as wheels or using a Python C compiler.

Meeting Minutes from 2023-08-09 Iceberg Sync

Reply via email to