> Flink Range distribution for Sinks It is already included in Ryan's draft
> Flink Source V2 improvements and V1 deprecation to prepare for Flink 2.0 This is still ongoing. There is a blocking issue with FileIOParser on HadoopFileIO: https://github.com/apache/iceberg/pull/10926 On Tue, Sep 10, 2024 at 10:06 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Maybe mention some Flink ongoing tasks, improvements: > - Flink Range distribution for Sinks > - Flink Source V2 improvements and V1 deprecation to prepare for Flink 2.0 > - Flink Sink V2 implementation to prepare for Flink 2.0 > - Flink Table Maintenance (ongoing) > > Thanks for preparing this Ryan! > Peter > > > On Tue, Sep 10, 2024, 23:51 Matt Topol <zotthewiz...@gmail.com> wrote: > >> There's one additional point to add for the Go implementation, we >> implemented file scan planning. It returns the list of file scan tasks >> needed for a given table, partitions and filter expression. >> >> --Matt >> >> On Tue, Sep 10, 2024, 5:43 PM rdb...@gmail.com <rdb...@gmail.com> wrote: >> >>> Hi everyone, >>> >>> It’s time for another ASF board report! Here’s my current draft. Please >>> reply if you think there is something that I should add or change. Thanks! >>> >>> Ryan >>> Description: >>> >>> Apache Iceberg is a table format for huge analytic datasets that is >>> designed >>> for high performance and ease of use. >>> Project Status: >>> >>> Current project status: Ongoing >>> Issues for the board: None >>> Membership Data: >>> >>> Apache Iceberg was founded 2020-05-19 (4 years ago) >>> There are currently 31 committers and 21 PMC members in this project. >>> The Committer-to-PMC ratio is roughly 4:3. >>> >>> Community changes, past quarter: >>> >>> - Amogh Jahagirdar was added to the PMC on 2024-08-12 >>> - Eduard Tudenhoefner was added to the PMC on 2024-08-12 >>> - Honah J. was added to the PMC on 2024-07-22 >>> - Renjie Liu was added to the PMC on 2024-07-22 >>> - Peter Vary was added to the PMC on 2024-08-12 >>> - Piotr Findeisen was added as committer on 2024-07-24 >>> - Kevin Liu was added as committer on 2024-07-24 >>> - Sung Yun was added as committer on 2024-07-24 >>> - Hao Ding was added as committer on 2024-07-23 >>> >>> Project Activity: >>> >>> Releases: >>> >>> - Java 1.6.1 was released on 2024-08-28 >>> - Rust 0.3.0 was released on 2024-08-20 >>> - PyIceberg 0.7.1 was released on 2024-08-18 >>> - PyIceberg 0.7.0 was released on 2024-07-30 >>> - Java 1.6.0 was released on 2024-07-23 >>> >>> Table format: >>> >>> - Work for v3 is picking up >>> - Committed timestamp_ns implementation >>> - Ongoing discussion/proposal for improvements to row-level deletes >>> - Ongoing discussion/proposal for row-level metadata for change >>> tracking >>> - Discussion for adding variant type and where to maintain the spec >>> (Parquet) >>> - Making progress on geometry types >>> - Clarified transform requirements to add transforms as needed (to >>> support geo) >>> - Discovered issues affecting new type promotion cases, reduced scope >>> >>> REST protocol specification: >>> >>> - Added server-side scan planning >>> - Support for removing partition specs >>> - Support for endpoint discovery for future additions >>> - Clarified failure requirements for unknown actions or validations >>> >>> Java: >>> >>> - Added classes for v3 table writes >>> - Fixed rewrites in tables with 1000+ columns >>> - Added Kafka Connect runtime bundle >>> - Support for Flink 1.20 >>> - Added range distribution support in Flink >>> - Dropped support for Java 8 >>> >>> PyIceberg: >>> >>> - Discussed adding a dependency on iceberg-rust for native extensions >>> - Write support for time and identity transforms >>> - Parallelized large writes >>> - Support for deletes using filter predicates >>> - Staged table creation for atomic CTAS >>> - Support manifest merging on write >>> - Better integration with PyArrow to produce lazy readers from scans >>> - New API to add existing Parquet files >>> - Support custom catalogs >>> >>> Rust: >>> >>> - Established subproject pyiceberg_core to support PyIceberg >>> - Implemented OAuth for catalog REST client >>> - Added Parquet writer and reader capabilities with support for data >>> projection. >>> - Introduced memory catalog and memory file IO support >>> - Initialized SQL Catalog >>> - Added support for GCS storage and AWS session tokens >>> - Implemented concurrent table scans and data file fetching >>> - Enhanced predicate builders and expression evaluators >>> - Added support for timestamp columns in row filters >>> >>> Go: >>> >>> - Implemented expressions and expression visitors >>> >>> Community Health: >>> >>> Several new committers and PMC members were added this quarter, which is >>> a good >>> indicator for community health. There was also a significant number of >>> threads >>> on the mailing list about setting expectations for contributors and >>> clearly >>> document how the community operates. New guidelines for merging PRs have >>> been >>> added to the website and the community is also discussing guidelines for >>> how >>> contributors can become committers. This builds on work from last >>> quarter that >>> clarified the process for design discussions. >>> >>> Many of the topics under discussion were raised because of the >>> acquisition that >>> was noted in the last board report. The community has been working to >>> address >>> the concerns raised, which are primarily in 3 areas: >>> >>> - How decisions are made about designs and commits (now clarified) >>> - How contributors become committers and PMC members (under >>> discussion) >>> - How the community operates when people cannot reach consensus >>> >>> The last concern has historically not been a problem; people have so far >>> chosen to “disagree and commit” when a large majority in the community >>> has >>> a different opinion. However, the first instance of this was encountered >>> near >>> the end of the quarter. The community and PMC need to discuss how to make >>> progress on the issue. >>> >>