Meeting Minutes from 2023-08-30 Iceberg Sync

Brian Olsen Thu, 26 Oct 2023 14:25:36 -0700

Hey Iceberg Nation,
Everyone is welcome to attend syncs. Subscribe to this calendar
<https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai>
to receive a notification. Note: This meeting note is backdated as I forgot
to post it here earlier.
2023-08-30 (Meeting Recording <https://www.youtube.com/watch?v=5kfnBYU7SME>
⭕ )


   -

   Highlights
   -

      Java: Flink sink adds custom partitioner to better distribute traffic
      for bucket partitioned tables
      <https://github.com/apache/iceberg/pull/7161> (Thanks, Sergio!)
      -

      Java: AWS, GCP, and Azure bundles (Thanks, Bryan!)
      -

      Java: Azure FileIO (Thanks, Bryan!)
      -

      Java: Delete file in job planning optimizations (Thanks, Anton!)
      -

      Python: Moved to Pydantic v2 (Thanks, Fokko!)
      -

      Java: Fixed branches with empty tables (Thanks, ConeyLiu!)
      -

      Rust: Merged TableMetadata (including (de)serialization), (Thanks,
      Jan!)
      -

      Go: Schema and types (Thanks, Matt!)
      -

   Releases
   -

      PyIceberg 0.5.0
      <https://lists.apache.org/thread/1oj7hcpp5ccc8tt1rjz7bj8yox4onlnc>
      -

         Blockers
         -

            Fixing schema evolution (#8374
            <https://github.com/apache/iceberg/pull/8374>)
            -

      Java 1.4.0 <https://github.com/apache/iceberg/milestone/35>
      -

         Blockers
         -

            Fixing history with lazy snapshot loading
            -

            V2 tables by default
            -

            Spark distributed planning
            -

            Default to zstd
            -

         Timeline: End of next week
         -

            Milestone: https://github.com/apache/iceberg/milestone/35
            -

   Discussion
   -

      Owned Table Location
      
https://docs.google.com/document/d/1pTJPQaHwyO0NFlLcHIrXq4gBazJmAyPnigmOPMbBRR0/edit?usp=sharing
      (Szehon)
      -

      Multi-arg transform:
      
https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing
      (advancedxy)
      -

      Iceberg Table Portability - Link
      
<https://docs.google.com/document/d/1q-pWhI8A8_T_zbxKoZ7Lg02ybz87kHrNP9zbm9p0RQk/edit#heading=h.2zb2jb8vclvm>
      -

      Nanosecond timestamp & timestamptz - sufficient consensus, next steps?

AI-generated chapter summaries: 0:00
<https://www.youtube.com/watch?v=5kfnBYU7SME&t=0s> Chapter 1 The team
discussed various updates and improvements, including the addition of a
custom partitioner for better data balancing, the creation of bundles for
easier use of cloud services, optimizations in the delete file job planning
path, and progress in the Rust and Go communities. 10:11
<https://www.youtube.com/watch?v=5kfnBYU7SME&t=611s> Chapter 2 The team
discussed various issues and updates related to the Java 1.4 release,
including the progress on resolving dependencies, the appointment of Anton
as the release manager, and the inclusion of features like distributed
planning in Spark and the use of C standard for new tables. They also
discussed the need for fixing lazy snapshot loading and the challenges of
table location ownership to prevent accidental destruction of tables. 21:02
<https://www.youtube.com/watch?v=5kfnBYU7SME&t=1262s> Chapter 3 The team
discussed the issue of orphaned files and the challenges of table and
location ownership. They explored different modes and approaches to address
the problem, including checking for location existence, assigning table
locations based on table names, and implementing a global orphan file
cleanup at the administrator level. 29:58
<https://www.youtube.com/watch?v=5kfnBYU7SME&t=1798s> Chapter 4 The
discussion revolved around the ownership and sharing of table locations in
a catalog. They considered different approaches and debated whether to use
table properties or a separate list of owned locations in the table
metadata bundle. 39:12 <https://www.youtube.com/watch?v=5kfnBYU7SME&t=2352s>
Chapter 5 Anton, Xianqing, and others discussed the proposal to include
multi-argument transformers in the documentation. They explored the
challenges and benefits of this change, including the need for expression
API modifications and the possibility of adding additional information for
normalization values in transforms. 49:44
<https://www.youtube.com/watch?v=5kfnBYU7SME&t=2984s> Chapter 6 - Anton
discussed the possibility of supporting additional files and range
partitioning for better performance. The team agreed to consider
implementing custom transforms and table portability in future discussions.

Meeting Minutes from 2023-08-30 Iceberg Sync

Reply via email to