Meeting Minutes from 2023-06-28 Iceberg Sync

Brian Olsen Sun, 02 Jul 2023 04:33:30 -0700

Hey Iceberg Nation!

Here are the minutes and recording from our Iceberg Sync. As a reminder,
anyone can join the discussion so feel free to share the Iceberg-Sync.


*NOTE:*  Due to technical difficulties of folks not receiving invitations
from the Iceberg Sync Google Group <https://groups.google.com/g/iceberg-sync>
we will move to sharing the link through a public calendar posted on the
Apace Iceberg site <
https://iceberg.apache.org/community/#iceberg-community-events>. Please
test that you can find the longstanding Google meeting on the calendar
before the next meeting. I'll be making this a separate thread on the dev
list to discuss this or pose any questions or concerns there.

The notes and the agenda are posted in the Iceberg Sync YouTube
description, as well as, maintained in the meeting minute notes
<
https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web>
which is linked on the calendar invite.

Meeting Recording
<https://www.youtube.com/watch?v=DhvRg0oEdwU>
⭕ / Meeting Transcript, can be found here in the video <
https://youtu.be/1lm4Wlpy2wU?t=28>


   -

   Highlights
   -

      OOM fix caused by Avro decoder caching
      <https://github.com/apache/iceberg/pull/7791> (Coney Liu)
      -

      Multiple shuffle partitions per file
      <https://github.com/apache/iceberg/pull/7897> (Anton)
      -

      Python: Add positional deletes
      <https://github.com/apache/iceberg/pull/6775> (Fokko)
      -

      Python: alter table in transactions
      <https://github.com/apache/iceberg/pull/6323> (Fokko)
      -

      View metadata implementation
      <https://github.com/apache/iceberg/pull/7759> (Eduard)
      -

      View support for InMemoryCatalog
      <https://github.com/apache/iceberg/pull/7880> (Eduard)
      -

      API for multi-table commits
      <https://github.com/apache/iceberg/pull/7569> (Eduard)
      -

         Catalog Transaction API is still in draft
         <https://github.com/apache/iceberg/pull/6948>
         -

      Flink: Split ordering based on Sequence Number
      <https://github.com/apache/iceberg/pull/7661> (Peter)
      -

      Adding new Iceberg Events Calendar
      
<https://calendar.google.com/calendar/u/1?cid=NTkzYmIwMGJmZTQ1N2QzMTkxNDEzNTBkZDI0Yzk2NGYzOWJkYmQ5ZmQyNDMyODFhODYzMmEwMDk2M2EyMWQ4NkBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>
      including this sync. (Brian)
      -

         Any PMC that wants access to edit this calendar reach out to me on
         Slack/Dev list. Will announce to dev list.
         -
      -

   Releases
   -

      Python 0.4 vote is out! (Please verify)
      -

   Discussion
   -

      bloom/cuckoo/other filters in manifest
      -

      Priority-based commit
      -

      Documentation efforts #documentation
      <https://slack.com/app_redirect?channel=C05BXHPEGTA&team=T025T8UC953>
      (Brian)
      -

         Proposed updates to the docs site
         
<https://docs.google.com/document/d/1Y_PRv6p5oJaxg_68AUia_JHw8P4-AZIu3hP5IH2Cpsw/edit>



AI-generated chapter summaries:

   - 0:00 Chapter 1 Daniel, Brian, and the team discussed various updates
   and progress made in the past few weeks, including fixes for memory issues,
   improvements in shuffle partitions for file compaction, positional delete
   support in Python, and the implementation of view metadata. They also
   mentioned plans for engine integrations and the need for further
   discussions on multi-table transactions.
   - 10:29 Chapter 2 The team discussed the progress and next steps for
   implementing multi-table commits and improving support for ordered read
   from iceberg tables. They also discussed plans to make the Iceberg
   community meetings more accessible by providing a public link instead of
   requiring a subscription to the dev list.
   - 21:03 Chapter 3 The team discussed the new features and improvements
   included in the ODOP release, such as positional leads, SQL style filters,
   and performance enhancements. They also explored the possibility of adding
   Bloom filters to the manifest file to improve point query performance
   - 31:54 Chapter 4 The discussion revolved around the potential use of
   Parquet Bloom filters to improve performance and reduce costs. They also
   discussed the possibility of introducing priority-based commit and the
   challenges associated with it.
   - 42:00 Chapter 5 The team discussed the possibility of implementing a
   rollback feature in the catalog and snapshot producer. They also talked
   about prioritizing documentation updates and planning to add examples and
   tutorials in the future.


Thanks! See you all at the next sync!

Meeting Minutes from 2023-06-28 Iceberg Sync

Reply via email to