There's one additional point to add for the Go implementation, we implemented file scan planning. It returns the list of file scan tasks needed for a given table, partitions and filter expression.
--Matt On Tue, Sep 10, 2024, 5:43 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > Hi everyone, > > It’s time for another ASF board report! Here’s my current draft. Please > reply if you think there is something that I should add or change. Thanks! > > Ryan > Description: > > Apache Iceberg is a table format for huge analytic datasets that is > designed > for high performance and ease of use. > Project Status: > > Current project status: Ongoing > Issues for the board: None > Membership Data: > > Apache Iceberg was founded 2020-05-19 (4 years ago) > There are currently 31 committers and 21 PMC members in this project. > The Committer-to-PMC ratio is roughly 4:3. > > Community changes, past quarter: > > - Amogh Jahagirdar was added to the PMC on 2024-08-12 > - Eduard Tudenhoefner was added to the PMC on 2024-08-12 > - Honah J. was added to the PMC on 2024-07-22 > - Renjie Liu was added to the PMC on 2024-07-22 > - Peter Vary was added to the PMC on 2024-08-12 > - Piotr Findeisen was added as committer on 2024-07-24 > - Kevin Liu was added as committer on 2024-07-24 > - Sung Yun was added as committer on 2024-07-24 > - Hao Ding was added as committer on 2024-07-23 > > Project Activity: > > Releases: > > - Java 1.6.1 was released on 2024-08-28 > - Rust 0.3.0 was released on 2024-08-20 > - PyIceberg 0.7.1 was released on 2024-08-18 > - PyIceberg 0.7.0 was released on 2024-07-30 > - Java 1.6.0 was released on 2024-07-23 > > Table format: > > - Work for v3 is picking up > - Committed timestamp_ns implementation > - Ongoing discussion/proposal for improvements to row-level deletes > - Ongoing discussion/proposal for row-level metadata for change > tracking > - Discussion for adding variant type and where to maintain the spec > (Parquet) > - Making progress on geometry types > - Clarified transform requirements to add transforms as needed (to > support geo) > - Discovered issues affecting new type promotion cases, reduced scope > > REST protocol specification: > > - Added server-side scan planning > - Support for removing partition specs > - Support for endpoint discovery for future additions > - Clarified failure requirements for unknown actions or validations > > Java: > > - Added classes for v3 table writes > - Fixed rewrites in tables with 1000+ columns > - Added Kafka Connect runtime bundle > - Support for Flink 1.20 > - Added range distribution support in Flink > - Dropped support for Java 8 > > PyIceberg: > > - Discussed adding a dependency on iceberg-rust for native extensions > - Write support for time and identity transforms > - Parallelized large writes > - Support for deletes using filter predicates > - Staged table creation for atomic CTAS > - Support manifest merging on write > - Better integration with PyArrow to produce lazy readers from scans > - New API to add existing Parquet files > - Support custom catalogs > > Rust: > > - Established subproject pyiceberg_core to support PyIceberg > - Implemented OAuth for catalog REST client > - Added Parquet writer and reader capabilities with support for data > projection. > - Introduced memory catalog and memory file IO support > - Initialized SQL Catalog > - Added support for GCS storage and AWS session tokens > - Implemented concurrent table scans and data file fetching > - Enhanced predicate builders and expression evaluators > - Added support for timestamp columns in row filters > > Go: > > - Implemented expressions and expression visitors > > Community Health: > > Several new committers and PMC members were added this quarter, which is a > good > indicator for community health. There was also a significant number of > threads > on the mailing list about setting expectations for contributors and clearly > document how the community operates. New guidelines for merging PRs have > been > added to the website and the community is also discussing guidelines for > how > contributors can become committers. This builds on work from last quarter > that > clarified the process for design discussions. > > Many of the topics under discussion were raised because of the acquisition > that > was noted in the last board report. The community has been working to > address > the concerns raised, which are primarily in 3 areas: > > - How decisions are made about designs and commits (now clarified) > - How contributors become committers and PMC members (under discussion) > - How the community operates when people cannot reach consensus > > The last concern has historically not been a problem; people have so far > chosen to “disagree and commit” when a large majority in the community has > a different opinion. However, the first instance of this was encountered > near > the end of the quarter. The community and PMC need to discuss how to make > progress on the issue. >