Hi everyone,

It’s time for another ASF board report! Here’s my current draft. Please
reply if you think there is something that I should add or change. Thanks!

Ryan
Description:

Apache Iceberg is a table format for huge analytic datasets that is designed
for high performance and ease of use.
Project Status:

Current project status: Ongoing
Issues for the board: None
Membership Data:

Apache Iceberg was founded 2020-05-19 (4 years ago)
There are currently 31 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is roughly 4:3.

Community changes, past quarter:

   - Amogh Jahagirdar was added to the PMC on 2024-08-12
   - Eduard Tudenhoefner was added to the PMC on 2024-08-12
   - Honah J. was added to the PMC on 2024-07-22
   - Renjie Liu was added to the PMC on 2024-07-22
   - Peter Vary was added to the PMC on 2024-08-12
   - Piotr Findeisen was added as committer on 2024-07-24
   - Kevin Liu was added as committer on 2024-07-24
   - Sung Yun was added as committer on 2024-07-24
   - Hao Ding was added as committer on 2024-07-23

Project Activity:

Releases:

   - Java 1.6.1 was released on 2024-08-28
   - Rust 0.3.0 was released on 2024-08-20
   - PyIceberg 0.7.1 was released on 2024-08-18
   - PyIceberg 0.7.0 was released on 2024-07-30
   - Java 1.6.0 was released on 2024-07-23

Table format:

   - Work for v3 is picking up
   - Committed timestamp_ns implementation
   - Ongoing discussion/proposal for improvements to row-level deletes
   - Ongoing discussion/proposal for row-level metadata for change tracking
   - Discussion for adding variant type and where to maintain the spec
   (Parquet)
   - Making progress on geometry types
   - Clarified transform requirements to add transforms as needed (to
   support geo)
   - Discovered issues affecting new type promotion cases, reduced scope

REST protocol specification:

   - Added server-side scan planning
   - Support for removing partition specs
   - Support for endpoint discovery for future additions
   - Clarified failure requirements for unknown actions or validations

Java:

   - Added classes for v3 table writes
   - Fixed rewrites in tables with 1000+ columns
   - Added Kafka Connect runtime bundle
   - Support for Flink 1.20
   - Added range distribution support in Flink
   - Dropped support for Java 8

PyIceberg:

   - Discussed adding a dependency on iceberg-rust for native extensions
   - Write support for time and identity transforms
   - Parallelized large writes
   - Support for deletes using filter predicates
   - Staged table creation for atomic CTAS
   - Support manifest merging on write
   - Better integration with PyArrow to produce lazy readers from scans
   - New API to add existing Parquet files
   - Support custom catalogs

Rust:

   - Established subproject pyiceberg_core to support PyIceberg
   - Implemented OAuth for catalog REST client
   - Added Parquet writer and reader capabilities with support for data
   projection.
   - Introduced memory catalog and memory file IO support
   - Initialized SQL Catalog
   - Added support for GCS storage and AWS session tokens
   - Implemented concurrent table scans and data file fetching
   - Enhanced predicate builders and expression evaluators
   - Added support for timestamp columns in row filters

Go:

   - Implemented expressions and expression visitors

Community Health:

Several new committers and PMC members were added this quarter, which is a
good
indicator for community health. There was also a significant number of
threads
on the mailing list about setting expectations for contributors and clearly
document how the community operates. New guidelines for merging PRs have
been
added to the website and the community is also discussing guidelines for how
contributors can become committers. This builds on work from last quarter
that
clarified the process for design discussions.

Many of the topics under discussion were raised because of the acquisition
that
was noted in the last board report. The community has been working to
address
the concerns raised, which are primarily in 3 areas:

   - How decisions are made about designs and commits (now clarified)
   - How contributors become committers and PMC members (under discussion)
   - How the community operates when people cannot reach consensus

The last concern has historically not been a problem; people have so far
chosen to “disagree and commit” when a large majority in the community has
a different opinion. However, the first instance of this was encountered
near
the end of the quarter. The community and PMC need to discuss how to make
progress on the issue.

Reply via email to