Re: Meeting Minutes from 2023-10-11 Iceberg Sync

Brian Olsen Fri, 27 Oct 2023 02:29:52 -0700

The spacing was after sending the email. If you click on the YouTube  Link,
it splits the YouTube video into chapters and spaces them out. They are
more legible there.


I’ll make sure to add bulletpoints moving forward.

On Thu, Oct 26, 2023 at 11:00 PM Xuanwo <xua...@apache.org> wrote:

> Thanks for the meeting recoding!
>
> The "AI-generated chapter summaries" don't seem very readable. Can this be
> improved?
>
> On Fri, Oct 27, 2023, at 05:25, Brian Olsen wrote:
>
> Hey Iceberg Nation,
> Everyone is welcome to attend syncs. Subscribe to this calendar
> <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai>
> to receive a notification. Note: This meeting note is backdated as I forgot
> to post it here earlier.
>
> 2023-10-11(Meeting Recording <https://youtu.be/euWtAKo_bV4> ⭕ )
>
>    -
>
>    Highlights
>    -
>
>       1.4.0 was released! (Thanks, Anton!)
>       -
>
>          v2 and zstd defaults
>          -
>
>          Advisory partition size in Spark
>          -
>
>          Skip local sort for unordered writes in Spark
>          -
>
>          Distributed planning in Spark
>          -
>
>          AzureFileIO
>          -
>
>          Multi-table commits through REST
>          -
>
>          Removed Spark 3.1
>          -
>
>       Python moved to the iceberg-python
>       <https://github.com/apache/iceberg-python> repo (removed from main)
>       -
>
>       Flink  alter table column support  was added
>       <https://github.com/apache/iceberg/pull/7628> (1.17 only), like
>       adding a new column, changing column position (Thanks, Yanghao Lin)
>       -
>
>       Metastore catalog support for views was added (Thanks, Eduard!)
>       -
>
>       Close to write support in Python, supports v1 and v2 metadata
>       (Thanks, Fokko!)
>       -
>
>       Rust added read support for manifest lists (Thanks, ZENOTME)
>       -
>
>       Spark: clean up FileIO resources on executors (Thanks, Anton!)
>       -
>
>    Discussion
>    -
>
>       PR commit methods – standardize on squash?
>       -
>
>       Iceberg docs refactor <https://github.com/apache/iceberg/pull/8659>
>       (try me
>       <https://github.com/bitsondatadev/iceberg/tree/new-docs/docs-new>)
>       -
>
>       Spec v3 changes:
>       -
>
>          New types
>          -
>
>             BLOB
>             -
>
>             BSON/JSON
>             -
>
>             Timestamp{tz}_{ns,ms}
>             
> <https://docs.google.com/document/d/1bE1DcEGNzZAMiVJSZ0X1wElKLNkT9kRkk0hDlfkXzvU/edit>
>             (not millis)
>             -
>
>             FLOAT16?
>             -
>
>          Default values
>          -
>
>          Type promotion
>          -
>
>             * to string (choose a format)
>             -
>
>                What are the use cases for changing the type?
>                -
>
>                int/long to string
>                -
>
>                float/timestamp - why?
>                -
>
>                Bool to string should be allowed
>                -
>
>             Long to timestamp (must be millis)
>             -
>
>          Multi-column transforms
>          -
>
>             Bucket v2
>             -
>
>             Geo?
>             -
>
>          Location/path requirements (recommendations)
>          -
>
>          Owned locations (discussion
>          <https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r>
>          )
>          -
>
>          Delete vectors (discussion
>          <https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g>
>          )
>          -
>
>          Allowing relative paths
>          -
>
>       Partition stats spec and discussion in PR 7105
>       <https://github.com/apache/iceberg/pull/7105>.
>
> Kafka Connect (discussion
> <https://lists.apache.org/thread/d9h22z2ydcpvjxp53yl6w96xoy3dp33h>)
>
>
> AI-generated chapter summaries: 0:00
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=0s> Chapter 0 Introduction
> 5:14 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=314s> Chapter 1
> Highlights Ryan thanks Anton for releasing v1.4 with many bug fixes and
> changes, including defaulting to v2 format and Z standard for data
> compression. Azure file IO is now available, with native support for
> multi-table commits in Spark. pyIceberg project moved to a new repository
> and new Python support was added. 12:01
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=721s> Chapter 2 PR commit
> methods and repository setup. Anton highlights recent improvements in
> Spark, including file cleanup and manifest file read support and plans to
> discuss spec v3 changes with the community. The group discusses PR commit
> methods, suggesting standardizing across repositories to use squash and
> merge by default, rather than merge commits. There was concern about
> enforcing linear history on the Java side, citing potential issues with
> rebase and time zones. One suggestion was bringing the issue of
> inconsistent commit messages to the community for resolution. A consensus
> is built around squashing commits to make them more meaningful and easier
> to understand. 19:30 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1170s>
> Chapter 3 Improving Iceberg Docs with a mono repo. Brian is refactoring the
> iceberg documentation to move it back into the main iceberg repo,
> simplifying maintenance and improving collaboration. He proposes to create
> a single documentation site containing the static site and for all versions
> of docs, solving problems with multiple sources and making releases easier.
> The plan is to merge an initial PR and build consensus, then replace the
> current ASF documentation branch and repoint it back to the main repo.
> We're creating a nightly branch for documentation changes, and maintaining
> it as an up-to-date snapshot. The readme file on the branch will have all
> the necessary information for building and understanding the project.
> 26:59 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1619s> Chapter 4 V3
> spec changes for data storage. The team discusses v3 spec changes,
> including partition stats, which may not be included in v3 due to a lack of
> need for backward compatibility. If partition stats are required for v3, it
> would need to be decided and implemented separately from the main v3
> discussion. Everyone should be aware that multi-column transforms are a
> v3-only change and are likely to break in v2. There are also some potential
> forward-breaking changes for Hadoop v3, including location path
> requirements and Delete vector proposal. 34:39
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=2079s> Chapter 5 Metadata
> requirements for Iceberg V3
>
> Xuanwo
>
>

Re: Meeting Minutes from 2023-10-11 Iceberg Sync

Reply via email to