One more thing: I think it will be great to have the parquet bloom filter feature (contributed from www.iq.com, one of the largest video websites in China) supported in iceberg 0.13.0 :
1. https://github.com/apache/iceberg/pull/2643 2. https://github.com/apache/iceberg/pull/2642 On Thu, Sep 9, 2021 at 9:36 AM OpenInx <open...@gmail.com> wrote: > Thanks for the summary, Ryan ! > > I would like to add the following thing into the roadmap for 0.13.0: > > *Flink Integration* > > 1. Upgrade the flink version from 1.12.1 to 1.13.2 ( > https://github.com/apache/iceberg/pull/2629). > > Because there is a bug in flink 1.12.1 when reading nested data types > (Map/List) in flink SQL (see: > https://github.com/apache/iceberg/pull/3081#pullrequestreview-747934199), > the newly released 1.13.2 has resolved it. > > 2. Support for creating an iceberg table with 'connector'='type' in flink > SQL (https://github.com/apache/iceberg/pull/2666). > > The PR has been merged but still left a flink connector document open for > reviewing (https://github.com/apache/iceberg/pull/3085). > > 3. Add streaming upsert option for flink write sink. ( > https://github.com/apache/iceberg/pull/2863) > > This is an essential PR for flink upsert stream when writing to iceberg > sink table, more background pls see > https://github.com/apache/iceberg/pull/1996#issue-546072705. > > *Ecosystem/Vendor integration.* > > 1. Aliyun OSS/DLF integration. ( > https://github.com/apache/iceberg/pull/2230) > > This is a very important job that has been suspended for a long time. The > good news is: Xingbo Wu <https://github.com/xingbowu> now has enough > bandwidth to make this forward now. I think we can successfully finish > this work If we've enough reviewing bandwidth. > > 2. Dell ECS integration. > > We have great discussion (https://github.com/apache/iceberg/pull/2807) > about integrating the private vendor storage/catalog into apache iceberg > repo, but I'm not sure it's suitable to add it into roadmap 0.13.0 before > we reach the agreement about the unit/integration/release tests for private > vendor integration. > > > > Dan also suggested using github projects to track the progress of each > feature. > > +1 ! We should make better use of github issues to manage the progress > and blockers of our roadmap, so that everyone can synchronize to the latest > status in time to make the roadmap forward. > > > On Thu, Sep 9, 2021 at 7:58 AM Ryan Blue <b...@tabular.io> wrote: > >> Hi everyone, >> >> The notes for the Iceberg community sync last week are now updated in the >> agenda/notes >> doc >> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>. >> If you have anything to add, feel free to let me know or add comments to >> the doc. >> >> We mainly discussed what projects we want to add to a roadmap and how to >> track them. I'll be sending out a discussion thread with the roadmap >> projects that we came up with so we can finalize it and add to it. Dan also >> suggested using github projects to track the progress of each feature. >> >> If you'd like to attend the syncs, you can add yourself to the iceberg-sync >> google group <https://groups.google.com/g/iceberg-sync> to receive the >> invites. Everyone is welcome to attend! >> >> Here are the notes if you prefer this over going to the doc: >> >> 1 September 2021 >> >> - >> >> Highlights >> - >> >> 0.12.0 release is out (Thanks, Carl!) >> - >> >> Metadata tables are updated for v2 (Thanks, Anton!) >> - >> >> Stored procedure to add and dedup files (Thanks, Szehon!) >> - >> >> Releases >> - >> >> 0.13.0 release timeline >> - >> >> Jack will be RM >> - >> >> Targeting late Oct or early Nov >> - >> >> 0.12.1 >> - >> >> Reads hanging <https://github.com/apache/iceberg/issues/3055> - >> need to find someone. Maybe Russell? >> - >> >> Parquet 1.12.0 bug >> <https://github.com/apache/iceberg/issues/2962>- Thanks, Kyle! >> - >> >> Roadmap discussion >> - >> >> Tracking >> - >> >> Dan: Github projects? >> - >> >> Ryan: Markdown file on the site? >> - >> >> Roadmap scope, items >> - >> >> Snapshot tagging and branching - Jack, Ryan (reviews) >> - >> >> Encryption - Gidon, Jack, Yufei >> - >> >> Merge-on-read plans in Spark - Anton, Ryan (reviews) >> - >> >> New writers >> - >> >> Delete compaction - Junjie, Puneet >> - >> >> Python - probably publish a separate roadmap >> - >> >> Separate google group >> <https://groups.google.com/g/iceberg-python-sync?hl=en> >> - >> >> Views - Anjali, John >> - >> >> Secondary indexes - Miao, Guy, Jack (some reviews) >> - >> >> File-level >> - >> >> Rollup >> - >> >> Spark streaming - Sreeram, Kyle, Anton (reviews) >> - >> >> CDC use case >> - >> >> Limit support to process large snapshots >> - >> >> CDC with Iceberg source >> - >> >> [v3] Relative paths - Anurag, Yufei >> - >> >> [v3] Z-ordering - Russell >> - >> >> [v3] Default values in schemas - Owen >> - >> >> Format v2 support in Trino - Jack >> - >> >> Multi-catalog support for Trino, ongoing for PrestoDB - Jack >> - >> >> Bucketed joins in Trino - Samarth has a working prototype >> - >> >> Versioned docs >> - >> >> Encryption PR / Design Doc - Gidon Gershinsky >> - >> >> Quick update >> - >> >> PRs with elements of the design >> - >> >> Sent a minimal google doc focused on MVP >> - >> >> Gidon to propose a time for encryption sync >> - >> >> View spec >> - >> >> First rev of the spec has feedback >> - >> >> Major question: SQL dialect >> - >> >> Do we have agreement to go ahead with the spec? >> - >> >> Do we need more time? >> - >> >> Carl: Spark would require dialect, version, and some config >> properties, so the spec is not sufficient >> - >> >> Ryan: The proposal includes places to store all of those >> - >> >> Carl: Views form a graph so is Iceberg an appropriate storage? >> - >> >> Anjali: Views across engine are not supported and metastores are >> not working, adding this to Iceberg at least makes it possible to share >> SQL, if not more in the future >> - >> >> Dan: Views are stored in different ways, which made it impossible >> to implement -- we tried before building the common view library at >> Netflix >> - >> >> Carl: Isn’t the representation just SQL? The spec punts on how to >> store the representation. No specifics >> - >> >> Carl: What has this enabled at Netflix? >> - >> >> Anjali: Simple common SQL works across engines >> - >> >> Ryan: And there is enough information to do view translation later >> in either Iceberg or in engines >> - >> >> Ran out of time >> >> >> -- >> Ryan Blue >> Tabular >> >