Meeting Minutes from 2023-05-17 Iceberg Sync

Eduard Tudenhoefner Wed, 17 May 2023 22:25:34 -0700

Hi Iceberg Community,
Here are the minutes and recording from our Iceberg Sync.


Always remember, anyone can join the discussion so feel free to share the
Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with
anyone seeking an invite.
The notes and the agenda are posted in the Iceberg Sync doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web>
that's
also attached to the meeting invitation and it's an excellent place to add
items as you see fit so we can discuss them in the following community sync.


Meeting Recording
<https://drive.google.com/file/d/11HoFAxbT_x49F-Qd2pchLgvA3nFRsmvM/view?usp=sharing>
⭕ / Meeting Transcript
<https://docs.google.com/document/d/1NwBMQewZWXelo6MYKpHRwRocSXSapChK_9vnNlSOL0Q/edit?usp=sharing>

   -

   Highlights
   -

      Rewriting position delete files procedure is in (Thanks, Szehon!)
      -

      Removed a sort from the MERGE cardinality check (Thanks, Anton!)
      -

      Mitigated FileIO closing problems (Thanks, Eduard!)
      -

      Python added SigV4 support for REST (Thanks, Dan!)
      -

      Python added support for converting Parquet schemas (Thanks, Rushan!)
      -

   Releases
   -

      Python 0.4 (proposed, positional delete support)
      -

      Java 1.3.x (Release manager: Anton)
      -

         Want Spark 3.4 for the next EMR release
         -

         Spark storage partition joins fixes
         -

         Flink 1.17 Support
         -

         Spark 3.4 Support
         -

            Timestamp without time zone? (nice-to-have)
            -

         Run Flink without Hadoop
         -

         https://github.com/apache/iceberg/issues/7623 release blocker
         -

            Already avoided in the metadata pushdown code
            -

         Parquet 1.13? (1.13.1 is needed and in a vote)
         -

            Ryan or Dan should vote
            -

   Discussion
   -

      Splitting update into delete and insert in MoR operations
      -

      Session-specific target split sizes in Spark
      -

         https://github.com/apache/iceberg/pull/7430
         -

         Should we pick split sizes automatically?
         -

         Overrides may still be needed for power users
         -

         Are hints to set options possible in Spark?
         -

      Adding min sequence numbers
      -

         https://github.com/apache/iceberg/pull/5760
         -

      Partition stats spec (whether to write a single file or multiple
      files)
      -

         https://github.com/apache/iceberg/pull/7105
         -

      Should table metadata deletion also obey gc.enabled flag?
      -

         https://github.com/apache/iceberg/pull/7576


Thanks everyone!

Meeting Minutes from 2023-05-17 Iceberg Sync

Reply via email to