Hi everyone, This is Jack Ye from AWS, and I will be the release manager for Iceberg 0.11.0. The purpose of this email is to start the preparation of this release.
Overview We have discussed with groups of people working on different areas of the codebase, and the current plan is to have the initial branch cut by the end of 01/20/2021 PST. Starting now, we will focus on code reviews based on priority defined in the next two sections. Any help for the code reviews would be greatly appreciated! If you think your PR is required or good to have (and close to be done) in this release train, please reply to this email thread so that we can evaluate the content and situation. The information will also be tracked at https://github.com/apache/iceberg/milestone/12. If you have any question around the release, feel free to contact me through email or slack. Required Pull Requests Here is a list of PRs that are currently considered as required but not merged yet AOD 1/13/2020: Core: 1. Fix date and timestamp transforms (https://github.com/apache/iceberg/pull/1981) 2. Handle NaN as min/max stats in evaluators (https://github.com/apache/iceberg/pull/2069) 3. Update record_count behavior, include in manifest reader (https://github.com/apache/iceberg/pull/1820) Hive: 1. Support case insensitive in hive query (https://github.com/apache/iceberg/pull/2053) 2. Fix join issues when CBO is enabled (https://github.com/apache/iceberg/pull/2052) Flink: 1. Support streaming reader (https://github.com/apache/iceberg/pull/1793) 2. Support filter pushdown (https://github.com/apache/iceberg/pull/1893) 3. Add rewrite file operator after iceberg committer (https://github.com/apache/iceberg/pull/1669) 4. Support sink when disable flink checkpoint disable (https://github.com/apache/iceberg/pull/1515) Nessie: 1. Fix property for custom catalog in Flink (https://github.com/apache/iceberg/pull/2031) 2. Add timestamp to table definition in Nessie catalog (https://github.com/apache/iceberg/pull/1825) Docs: 1. Fix bug in AWS doc that HTTP client package is not included in bundle (https://github.com/apache/iceberg/pull/2072) 2. Adds initial Documentation for Iceberg Stored Procedures (https://github.com/apache/iceberg/pull/2067) Good-to-have Pull Requests Here are a list of PRs that people consider good to have and possible to be merged in the current release train AOD 1/13/2020: Core: 1. Allow binary truncation length to be zero to handle evaluators that encounter empty string values (https://github.com/apache/iceberg/pull/2081) 2. Add contains_nan to field_summary (https://github.com/apache/iceberg/pull/1872) 3. Core: Implement NaN counts in ORC (https://github.com/apache/iceberg/pull/1790) Hive: 1. Fix Deserializer to use source deserializer instead of the Iceberg ones (https://github.com/apache/iceberg/pull/2078) 2. Implementation for INSERT INTO Iceberg backed Hive tables using the new HiveIcebergRecordWriter (https://github.com/apache/iceberg/pull/2038) 3. Allow auto conversion of Hive types when the CREATE TABLE statement contains a not supported type (https://github.com/apache/iceberg/pull/2054) 4. Add ObjectInspector implementations for UUID, Fixed and Time type (https://github.com/apache/iceberg/pull/2077) Merge & Update: 1. Spark MERGE INTO Support (copy-on-write implementation) (https://github.com/apache/iceberg/pull/1947) 2. Add the cardinality check to detect ambiguous target row for MERGE INTO (https://github.com/apache/iceberg/pull/2021) 3. Implement logic to group and sort rows before writing rows for MERGE INTO. (https://github.com/apache/iceberg/pull/2022) Thank you, Jack Ye