Re: Meeting Minutes from 2023-10-11 Iceberg Sync

Xuanwo Thu, 26 Oct 2023 21:00:18 -0700

Thanks for the meeting recoding!

The "AI-generated chapter summaries" don't seem very readable. Can this be 
improved?


On Fri, Oct 27, 2023, at 05:25, Brian Olsen wrote:
> Hey Iceberg Nation, 
> Everyone is welcome to attend syncs. Subscribe to this calendar 
> <https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai>
>  to receive a notification. Note: This meeting note is backdated as I forgot 
> to post it here earlier.
> 
> 2023-10-11(Meeting Recording <https://youtu.be/euWtAKo_bV4> ⭕ ) 
> 
>  • Highlights
> 
>    • 1.4.0 was released! (Thanks, Anton!)
> 
>      • v2 and zstd defaults
> 
>      • Advisory partition size in Spark
> 
>      • Skip local sort for unordered writes in Spark
> 
>      • Distributed planning in Spark
> 
>      • AzureFileIO
> 
>      • Multi-table commits through REST
> 
>      • Removed Spark 3.1
> 
>    • Python moved to the iceberg-python 
> <https://github.com/apache/iceberg-python> repo (removed from main)
> 
>    • Flink  alter table column support  was added 
> <https://github.com/apache/iceberg/pull/7628> (1.17 only), like adding a new 
> column, changing column position (Thanks, Yanghao Lin)
> 
>    • Metastore catalog support for views was added (Thanks, Eduard!)
> 
>    • Close to write support in Python, supports v1 and v2 metadata (Thanks, 
> Fokko!)
> 
>    • Rust added read support for manifest lists (Thanks, ZENOTME)
> 
>    • Spark: clean up FileIO resources on executors (Thanks, Anton!)
> 
>  • Discussion
> 
>    • PR commit methods – standardize on squash?
> 
>    • Iceberg docs refactor <https://github.com/apache/iceberg/pull/8659> (try 
> me <https://github.com/bitsondatadev/iceberg/tree/new-docs/docs-new>)
> 
>    • Spec v3 changes:
> 
>      • New types
> 
>        • BLOB
> 
>        • BSON/JSON
> 
>        • Timestamp{tz}_{ns,ms} 
> <https://docs.google.com/document/d/1bE1DcEGNzZAMiVJSZ0X1wElKLNkT9kRkk0hDlfkXzvU/edit>
>  (not millis)
> 
>        • FLOAT16?
> 
>      • Default values
> 
>      • Type promotion
> 
>        • * to string (choose a format)
> 
>          • What are the use cases for changing the type?
> 
>          • int/long to string
> 
>          • float/timestamp - why?
> 
>          • Bool to string should be allowed
> 
>        • Long to timestamp (must be millis)
> 
>      • Multi-column transforms
> 
>        • Bucket v2
> 
>        • Geo?
> 
>      • Location/path requirements (recommendations)
> 
>      • Owned locations (discussion 
> <https://lists.apache.org/thread/3fx8povnsq0f4g1xzj38snplr6d3ch1r>)
> 
>      • Delete vectors (discussion 
> <https://lists.apache.org/thread/gr3g5rrr60fhvy0mrdj4s4w9x8c3v58g>)
> 
>      • Allowing relative paths
> 
>    • Partition stats spec and discussion in PR 7105 
> <https://github.com/apache/iceberg/pull/7105>.
> 
> Kafka Connect (discussion 
> <https://lists.apache.org/thread/d9h22z2ydcpvjxp53yl6w96xoy3dp33h>)
> 
> 
> AI-generated chapter summaries: 0:00 
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=0s> Chapter 0 Introduction 
> 5:14 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=314s> Chapter 1 
> Highlights Ryan thanks Anton for releasing v1.4 with many bug fixes and 
> changes, including defaulting to v2 format and Z standard for data 
> compression. Azure file IO is now available, with native support for 
> multi-table commits in Spark. pyIceberg project moved to a new repository and 
> new Python support was added. 12:01 
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=721s> Chapter 2 PR commit 
> methods and repository setup. Anton highlights recent improvements in Spark, 
> including file cleanup and manifest file read support and plans to discuss 
> spec v3 changes with the community. The group discusses PR commit methods, 
> suggesting standardizing across repositories to use squash and merge by 
> default, rather than merge commits. There was concern about enforcing linear 
> history on the Java side, citing potential issues with rebase and time zones. 
> One suggestion was bringing the issue of inconsistent commit messages to the 
> community for resolution. A consensus is built around squashing commits to 
> make them more meaningful and easier to understand. 19:30 
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1170s> Chapter 3 Improving 
> Iceberg Docs with a mono repo. Brian is refactoring the iceberg documentation 
> to move it back into the main iceberg repo, simplifying maintenance and 
> improving collaboration. He proposes to create a single documentation site 
> containing the static site and for all versions of docs, solving problems 
> with multiple sources and making releases easier. The plan is to merge an 
> initial PR and build consensus, then replace the current ASF documentation 
> branch and repoint it back to the main repo. We're creating a nightly branch 
> for documentation changes, and maintaining it as an up-to-date snapshot. The 
> readme file on the branch will have all the necessary information for 
> building and understanding the project. 26:59 
> <https://www.youtube.com/watch?v=euWtAKo_bV4&t=1619s> Chapter 4 V3 spec 
> changes for data storage. The team discusses v3 spec changes, including 
> partition stats, which may not be included in v3 due to a lack of need for 
> backward compatibility. If partition stats are required for v3, it would need 
> to be decided and implemented separately from the main v3 discussion. 
> Everyone should be aware that multi-column transforms are a v3-only change 
> and are likely to break in v2. There are also some potential forward-breaking 
> changes for Hadoop v3, including location path requirements and Delete vector 
> proposal. 34:39 <https://www.youtube.com/watch?v=euWtAKo_bV4&t=2079s> Chapter 
> 5 Metadata requirements for Iceberg V3
Xuanwo

Re: Meeting Minutes from 2023-10-11 Iceberg Sync

Reply via email to