Maybe mention some Flink ongoing tasks, improvements: - Flink Range distribution for Sinks - Flink Source V2 improvements and V1 deprecation to prepare for Flink 2.0 - Flink Sink V2 implementation to prepare for Flink 2.0 - Flink Table Maintenance (ongoing)
Thanks for preparing this Ryan! Peter On Tue, Sep 10, 2024, 23:51 Matt Topol <zotthewiz...@gmail.com> wrote: > There's one additional point to add for the Go implementation, we > implemented file scan planning. It returns the list of file scan tasks > needed for a given table, partitions and filter expression. > > --Matt > > On Tue, Sep 10, 2024, 5:43 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > >> Hi everyone, >> >> It’s time for another ASF board report! Here’s my current draft. Please >> reply if you think there is something that I should add or change. Thanks! >> >> Ryan >> Description: >> >> Apache Iceberg is a table format for huge analytic datasets that is >> designed >> for high performance and ease of use. >> Project Status: >> >> Current project status: Ongoing >> Issues for the board: None >> Membership Data: >> >> Apache Iceberg was founded 2020-05-19 (4 years ago) >> There are currently 31 committers and 21 PMC members in this project. >> The Committer-to-PMC ratio is roughly 4:3. >> >> Community changes, past quarter: >> >> - Amogh Jahagirdar was added to the PMC on 2024-08-12 >> - Eduard Tudenhoefner was added to the PMC on 2024-08-12 >> - Honah J. was added to the PMC on 2024-07-22 >> - Renjie Liu was added to the PMC on 2024-07-22 >> - Peter Vary was added to the PMC on 2024-08-12 >> - Piotr Findeisen was added as committer on 2024-07-24 >> - Kevin Liu was added as committer on 2024-07-24 >> - Sung Yun was added as committer on 2024-07-24 >> - Hao Ding was added as committer on 2024-07-23 >> >> Project Activity: >> >> Releases: >> >> - Java 1.6.1 was released on 2024-08-28 >> - Rust 0.3.0 was released on 2024-08-20 >> - PyIceberg 0.7.1 was released on 2024-08-18 >> - PyIceberg 0.7.0 was released on 2024-07-30 >> - Java 1.6.0 was released on 2024-07-23 >> >> Table format: >> >> - Work for v3 is picking up >> - Committed timestamp_ns implementation >> - Ongoing discussion/proposal for improvements to row-level deletes >> - Ongoing discussion/proposal for row-level metadata for change >> tracking >> - Discussion for adding variant type and where to maintain the spec >> (Parquet) >> - Making progress on geometry types >> - Clarified transform requirements to add transforms as needed (to >> support geo) >> - Discovered issues affecting new type promotion cases, reduced scope >> >> REST protocol specification: >> >> - Added server-side scan planning >> - Support for removing partition specs >> - Support for endpoint discovery for future additions >> - Clarified failure requirements for unknown actions or validations >> >> Java: >> >> - Added classes for v3 table writes >> - Fixed rewrites in tables with 1000+ columns >> - Added Kafka Connect runtime bundle >> - Support for Flink 1.20 >> - Added range distribution support in Flink >> - Dropped support for Java 8 >> >> PyIceberg: >> >> - Discussed adding a dependency on iceberg-rust for native extensions >> - Write support for time and identity transforms >> - Parallelized large writes >> - Support for deletes using filter predicates >> - Staged table creation for atomic CTAS >> - Support manifest merging on write >> - Better integration with PyArrow to produce lazy readers from scans >> - New API to add existing Parquet files >> - Support custom catalogs >> >> Rust: >> >> - Established subproject pyiceberg_core to support PyIceberg >> - Implemented OAuth for catalog REST client >> - Added Parquet writer and reader capabilities with support for data >> projection. >> - Introduced memory catalog and memory file IO support >> - Initialized SQL Catalog >> - Added support for GCS storage and AWS session tokens >> - Implemented concurrent table scans and data file fetching >> - Enhanced predicate builders and expression evaluators >> - Added support for timestamp columns in row filters >> >> Go: >> >> - Implemented expressions and expression visitors >> >> Community Health: >> >> Several new committers and PMC members were added this quarter, which is >> a good >> indicator for community health. There was also a significant number of >> threads >> on the mailing list about setting expectations for contributors and >> clearly >> document how the community operates. New guidelines for merging PRs have >> been >> added to the website and the community is also discussing guidelines for >> how >> contributors can become committers. This builds on work from last quarter >> that >> clarified the process for design discussions. >> >> Many of the topics under discussion were raised because of the >> acquisition that >> was noted in the last board report. The community has been working to >> address >> the concerns raised, which are primarily in 3 areas: >> >> - How decisions are made about designs and commits (now clarified) >> - How contributors become committers and PMC members (under >> discussion) >> - How the community operates when people cannot reach consensus >> >> The last concern has historically not been a problem; people have so far >> chosen to “disagree and commit” when a large majority in the community has >> a different opinion. However, the first instance of this was encountered >> near >> the end of the quarter. The community and PMC need to discuss how to make >> progress on the issue. >> >