Re: Iceberg community sync notes for 1 September 2021

Ryan Blue Fri, 10 Sep 2021 12:57:10 -0700

Thanks for adding these items, @OpenInx! I'll add them to the roadmap
discussion.


One thing I want to note is that I don't think that the roadmap is going to
be tied to releases. Since the start, we've not delayed Iceberg releases to
get specific features in. Instead, we've preferred to get releases done and
release more often. I think that's what we want to continue doing and it
fits with the suggestion we've had from several places to do time-based
releases (as long as more often is fine!).

When we talk about specific features in the roadmap, I think we should
mainly track what work is needed, who is working on it, and who is
reviewing it. It's always really hard to judge when a feature will make it
into a release when that depends on the people building it and reviewers
having time.

Ryan

On Wed, Sep 8, 2021 at 6:36 PM OpenInx <open...@gmail.com> wrote:

> Thanks for the summary,  Ryan !
>
> I would like to add the following thing into the roadmap for 0.13.0:
>
> *Flink Integration*
>
> 1.  Upgrade the flink version from 1.12.1 to 1.13.2 (
> https://github.com/apache/iceberg/pull/2629).
>
> Because there is a bug in flink 1.12.1 when reading nested data types
> (Map/List) in flink SQL (see:
> https://github.com/apache/iceberg/pull/3081#pullrequestreview-747934199),
> the newly released 1.13.2 has resolved it.
>
> 2.  Support for creating an iceberg table with 'connector'='type' in flink
> SQL (https://github.com/apache/iceberg/pull/2666).
>
> The PR has been merged but still left a flink connector document open for
> reviewing (https://github.com/apache/iceberg/pull/3085).
>
> 3.  Add streaming upsert option for flink write sink. (
> https://github.com/apache/iceberg/pull/2863)
>
> This is an essential PR for flink upsert stream when writing to iceberg
> sink table, more background pls see
> https://github.com/apache/iceberg/pull/1996#issue-546072705.
>
> *Ecosystem/Vendor integration.*
>
> 1.  Aliyun OSS/DLF integration. (
> https://github.com/apache/iceberg/pull/2230)
>
> This is a very important job that has been suspended for a long time.  The
> good news is:  Xingbo Wu <https://github.com/xingbowu> now has enough
> bandwidth to make this forward now.  I think we can successfully finish
> this work If we've enough reviewing bandwidth.
>
> 2. Dell ECS integration.
>
> We have great discussion (https://github.com/apache/iceberg/pull/2807)
> about integrating the private vendor storage/catalog into apache iceberg
> repo, but I'm not sure it's suitable to add it into roadmap 0.13.0 before
> we reach the agreement about the unit/integration/release tests for private
> vendor integration.
>
>
> > Dan also suggested using github projects to track the progress of each
> feature.
>
> +1  ! We should make better use of github issues to manage the progress
> and blockers of our roadmap, so that everyone can synchronize to the latest
> status in time to make the roadmap forward.
>
>
> On Thu, Sep 9, 2021 at 7:58 AM Ryan Blue <b...@tabular.io> wrote:
>
>> Hi everyone,
>>
>> The notes for the Iceberg community sync last week are now updated in the 
>> agenda/notes
>> doc
>> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>.
>> If you have anything to add, feel free to let me know or add comments to
>> the doc.
>>
>> We mainly discussed what projects we want to add to a roadmap and how to
>> track them. I'll be sending out a discussion thread with the roadmap
>> projects that we came up with so we can finalize it and add to it. Dan also
>> suggested using github projects to track the progress of each feature.
>>
>> If you'd like to attend the syncs, you can add yourself to the iceberg-sync
>> google group <https://groups.google.com/g/iceberg-sync> to receive the
>> invites. Everyone is welcome to attend!
>>
>> Here are the notes if you prefer this over going to the doc:
>>
>> 1 September 2021
>>
>>    -
>>
>>    Highlights
>>    -
>>
>>       0.12.0 release is out (Thanks, Carl!)
>>       -
>>
>>       Metadata tables are updated for v2 (Thanks, Anton!)
>>       -
>>
>>       Stored procedure to add and dedup files (Thanks, Szehon!)
>>       -
>>
>>    Releases
>>    -
>>
>>       0.13.0 release timeline
>>       -
>>
>>          Jack will be RM
>>          -
>>
>>          Targeting late Oct or early Nov
>>          -
>>
>>       0.12.1
>>       -
>>
>>          Reads hanging <https://github.com/apache/iceberg/issues/3055> -
>>          need to find someone. Maybe Russell?
>>          -
>>
>>          Parquet 1.12.0 bug
>>          <https://github.com/apache/iceberg/issues/2962>- Thanks, Kyle!
>>          -
>>
>>    Roadmap discussion
>>    -
>>
>>       Tracking
>>       -
>>
>>          Dan: Github projects?
>>          -
>>
>>          Ryan: Markdown file on the site?
>>          -
>>
>>       Roadmap scope, items
>>       -
>>
>>          Snapshot tagging and branching - Jack, Ryan (reviews)
>>          -
>>
>>          Encryption - Gidon, Jack, Yufei
>>          -
>>
>>          Merge-on-read plans in Spark - Anton, Ryan (reviews)
>>          -
>>
>>             New writers
>>             -
>>
>>          Delete compaction - Junjie, Puneet
>>          -
>>
>>          Python - probably publish a separate roadmap
>>          -
>>
>>             Separate google group
>>             <https://groups.google.com/g/iceberg-python-sync?hl=en>
>>             -
>>
>>          Views - Anjali, John
>>          -
>>
>>          Secondary indexes - Miao, Guy, Jack (some reviews)
>>          -
>>
>>             File-level
>>             -
>>
>>             Rollup
>>             -
>>
>>          Spark streaming - Sreeram, Kyle, Anton (reviews)
>>          -
>>
>>             CDC use case
>>             -
>>
>>             Limit support to process large snapshots
>>             -
>>
>>             CDC with Iceberg source
>>             -
>>
>>          [v3] Relative paths - Anurag, Yufei
>>          -
>>
>>          [v3] Z-ordering - Russell
>>          -
>>
>>          [v3] Default values in schemas - Owen
>>          -
>>
>>          Format v2 support in Trino - Jack
>>          -
>>
>>          Multi-catalog support for Trino, ongoing for PrestoDB - Jack
>>          -
>>
>>          Bucketed joins in Trino - Samarth has a working prototype
>>          -
>>
>>          Versioned docs
>>          -
>>
>>    Encryption PR / Design Doc - Gidon Gershinsky
>>    -
>>
>>       Quick update
>>       -
>>
>>       PRs with elements of the design
>>       -
>>
>>       Sent a minimal google doc focused on MVP
>>       -
>>
>>       Gidon to propose a time for encryption sync
>>       -
>>
>>    View spec
>>    -
>>
>>       First rev of the spec has feedback
>>       -
>>
>>       Major question: SQL dialect
>>       -
>>
>>          Do we have agreement to go ahead with the spec?
>>          -
>>
>>          Do we need more time?
>>          -
>>
>>       Carl: Spark would require dialect, version, and some config
>>       properties, so the spec is not sufficient
>>       -
>>
>>       Ryan: The proposal includes places to store all of those
>>       -
>>
>>       Carl: Views form a graph so is Iceberg an appropriate storage?
>>       -
>>
>>       Anjali: Views across engine are not supported and metastores are
>>       not working, adding this to Iceberg at least makes it possible to share
>>       SQL, if not more in the future
>>       -
>>
>>       Dan: Views are stored in different ways, which made it impossible
>>       to implement -- we tried before building the common view library at 
>> Netflix
>>       -
>>
>>       Carl: Isn’t the representation just SQL? The spec punts on how to
>>       store the representation. No specifics
>>       -
>>
>>       Carl: What has this enabled at Netflix?
>>       -
>>
>>       Anjali: Simple common SQL works across engines
>>       -
>>
>>       Ryan: And there is enough information to do view translation later
>>       in either Iceberg or in engines
>>       -
>>
>>    Ran out of time
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Re: Iceberg community sync notes for 1 September 2021

Reply via email to