Re: Iceberg community sync notes for 1 September 2021

OpenInx Wed, 08 Sep 2021 20:33:05 -0700

One more thing:  I think it will be great to have the parquet bloom filter
feature (contributed from www.iq.com, one of the largest video websites in
China) supported in iceberg 0.13.0 :


1. https://github.com/apache/iceberg/pull/2643
2. https://github.com/apache/iceberg/pull/2642

On Thu, Sep 9, 2021 at 9:36 AM OpenInx <[email protected]> wrote:

> Thanks for the summary,  Ryan !
>
> I would like to add the following thing into the roadmap for 0.13.0:
>
> *Flink Integration*
>
> 1.  Upgrade the flink version from 1.12.1 to 1.13.2 (
> https://github.com/apache/iceberg/pull/2629).
>
> Because there is a bug in flink 1.12.1 when reading nested data types
> (Map/List) in flink SQL (see:
> https://github.com/apache/iceberg/pull/3081#pullrequestreview-747934199),
> the newly released 1.13.2 has resolved it.
>
> 2.  Support for creating an iceberg table with 'connector'='type' in flink
> SQL (https://github.com/apache/iceberg/pull/2666).
>
> The PR has been merged but still left a flink connector document open for
> reviewing (https://github.com/apache/iceberg/pull/3085).
>
> 3.  Add streaming upsert option for flink write sink. (
> https://github.com/apache/iceberg/pull/2863)
>
> This is an essential PR for flink upsert stream when writing to iceberg
> sink table, more background pls see
> https://github.com/apache/iceberg/pull/1996#issue-546072705.
>
> *Ecosystem/Vendor integration.*
>
> 1.  Aliyun OSS/DLF integration. (
> https://github.com/apache/iceberg/pull/2230)
>
> This is a very important job that has been suspended for a long time.  The
> good news is:  Xingbo Wu <https://github.com/xingbowu> now has enough
> bandwidth to make this forward now.  I think we can successfully finish
> this work If we've enough reviewing bandwidth.
>
> 2. Dell ECS integration.
>
> We have great discussion (https://github.com/apache/iceberg/pull/2807)
> about integrating the private vendor storage/catalog into apache iceberg
> repo, but I'm not sure it's suitable to add it into roadmap 0.13.0 before
> we reach the agreement about the unit/integration/release tests for private
> vendor integration.
>
>
> > Dan also suggested using github projects to track the progress of each
> feature.
>
> +1  ! We should make better use of github issues to manage the progress
> and blockers of our roadmap, so that everyone can synchronize to the latest
> status in time to make the roadmap forward.
>
>
> On Thu, Sep 9, 2021 at 7:58 AM Ryan Blue <[email protected]> wrote:
>
>> Hi everyone,
>>
>> The notes for the Iceberg community sync last week are now updated in the 
>> agenda/notes
>> doc
>> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>.
>> If you have anything to add, feel free to let me know or add comments to
>> the doc.
>>
>> We mainly discussed what projects we want to add to a roadmap and how to
>> track them. I'll be sending out a discussion thread with the roadmap
>> projects that we came up with so we can finalize it and add to it. Dan also
>> suggested using github projects to track the progress of each feature.
>>
>> If you'd like to attend the syncs, you can add yourself to the iceberg-sync
>> google group <https://groups.google.com/g/iceberg-sync> to receive the
>> invites. Everyone is welcome to attend!
>>
>> Here are the notes if you prefer this over going to the doc:
>>
>> 1 September 2021
>>
>>    -
>>
>>    Highlights
>>    -
>>
>>       0.12.0 release is out (Thanks, Carl!)
>>       -
>>
>>       Metadata tables are updated for v2 (Thanks, Anton!)
>>       -
>>
>>       Stored procedure to add and dedup files (Thanks, Szehon!)
>>       -
>>
>>    Releases
>>    -
>>
>>       0.13.0 release timeline
>>       -
>>
>>          Jack will be RM
>>          -
>>
>>          Targeting late Oct or early Nov
>>          -
>>
>>       0.12.1
>>       -
>>
>>          Reads hanging <https://github.com/apache/iceberg/issues/3055> -
>>          need to find someone. Maybe Russell?
>>          -
>>
>>          Parquet 1.12.0 bug
>>          <https://github.com/apache/iceberg/issues/2962>- Thanks, Kyle!
>>          -
>>
>>    Roadmap discussion
>>    -
>>
>>       Tracking
>>       -
>>
>>          Dan: Github projects?
>>          -
>>
>>          Ryan: Markdown file on the site?
>>          -
>>
>>       Roadmap scope, items
>>       -
>>
>>          Snapshot tagging and branching - Jack, Ryan (reviews)
>>          -
>>
>>          Encryption - Gidon, Jack, Yufei
>>          -
>>
>>          Merge-on-read plans in Spark - Anton, Ryan (reviews)
>>          -
>>
>>             New writers
>>             -
>>
>>          Delete compaction - Junjie, Puneet
>>          -
>>
>>          Python - probably publish a separate roadmap
>>          -
>>
>>             Separate google group
>>             <https://groups.google.com/g/iceberg-python-sync?hl=en>
>>             -
>>
>>          Views - Anjali, John
>>          -
>>
>>          Secondary indexes - Miao, Guy, Jack (some reviews)
>>          -
>>
>>             File-level
>>             -
>>
>>             Rollup
>>             -
>>
>>          Spark streaming - Sreeram, Kyle, Anton (reviews)
>>          -
>>
>>             CDC use case
>>             -
>>
>>             Limit support to process large snapshots
>>             -
>>
>>             CDC with Iceberg source
>>             -
>>
>>          [v3] Relative paths - Anurag, Yufei
>>          -
>>
>>          [v3] Z-ordering - Russell
>>          -
>>
>>          [v3] Default values in schemas - Owen
>>          -
>>
>>          Format v2 support in Trino - Jack
>>          -
>>
>>          Multi-catalog support for Trino, ongoing for PrestoDB - Jack
>>          -
>>
>>          Bucketed joins in Trino - Samarth has a working prototype
>>          -
>>
>>          Versioned docs
>>          -
>>
>>    Encryption PR / Design Doc - Gidon Gershinsky
>>    -
>>
>>       Quick update
>>       -
>>
>>       PRs with elements of the design
>>       -
>>
>>       Sent a minimal google doc focused on MVP
>>       -
>>
>>       Gidon to propose a time for encryption sync
>>       -
>>
>>    View spec
>>    -
>>
>>       First rev of the spec has feedback
>>       -
>>
>>       Major question: SQL dialect
>>       -
>>
>>          Do we have agreement to go ahead with the spec?
>>          -
>>
>>          Do we need more time?
>>          -
>>
>>       Carl: Spark would require dialect, version, and some config
>>       properties, so the spec is not sufficient
>>       -
>>
>>       Ryan: The proposal includes places to store all of those
>>       -
>>
>>       Carl: Views form a graph so is Iceberg an appropriate storage?
>>       -
>>
>>       Anjali: Views across engine are not supported and metastores are
>>       not working, adding this to Iceberg at least makes it possible to share
>>       SQL, if not more in the future
>>       -
>>
>>       Dan: Views are stored in different ways, which made it impossible
>>       to implement -- we tried before building the common view library at 
>> Netflix
>>       -
>>
>>       Carl: Isn’t the representation just SQL? The spec punts on how to
>>       store the representation. No specifics
>>       -
>>
>>       Carl: What has this enabled at Netflix?
>>       -
>>
>>       Anjali: Simple common SQL works across engines
>>       -
>>
>>       Ryan: And there is enough information to do view translation later
>>       in either Iceberg or in engines
>>       -
>>
>>    Ran out of time
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Re: Iceberg community sync notes for 1 September 2021

Reply via email to