Re: [DISCUSS] Iceberg roadmap

Ryan Blue Sun, 12 Sep 2021 09:04:43 -0700

Szehon, I'm not sure that I'd consider either of those for the roadmap.
While they are important to work on, they seem more like individual PRs
than high-level projects that might span PRs. This is definitely a good
thing to consider, though. Should we go more granular than the list that
we've compiled?


On Fri, Sep 10, 2021 at 5:07 PM Szehon Ho <[email protected]> wrote:

> Hi
>
> I also missed the last sync, and wanted to add two things if possible.
>
> Thanks,
> Szehon
>
> Priority 2:
>
>    - Core: Predicate pushdown for remaining Metadata tables [medium]
>    - Core/Spark: Support serializable isolation for ReplacePartitions /
>    Insert Overwrite [medium]
>
>
> On Fri, Sep 10, 2021 at 4:40 PM Steven Wu <[email protected]> wrote:
>
>> I would like to add a item
>>
>> Priority 2:
>> Flink: FLIP-27 based Iceberg source [large]
>>
>> On Fri, Sep 10, 2021 at 2:38 PM Ryan Blue <[email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> At the last sync meeting, we brought up publishing a community roadmap
>>> and brainstormed the many features and initiatives that the community is
>>> working on. In this thread, I want to make sure that we have a good list of
>>> what people are thinking about and I think we should try to categorize the
>>> projects by size and general priority. When we reach a rough agreement,
>>> I’ll write this up and post it on the ASF site along with links to some
>>> projects in Github.
>>>
>>> My rationale for attempting to prioritize projects is that if we try to
>>> do too many things, it will be slower progress across everything rather
>>> than getting a few important items done. I know that priorities don’t align
>>> very cleanly in practice, but it is hopefully worth trying. To come up with
>>> a priority, I’m trying to keep top priority items to a minimum by including
>>> only one from each group (Spark, Flink, Python, etc.). The remaining items
>>> are split between priority 2 and 3. Priority 3 is not urgent, including
>>> things that can be plugged in (like other IO libraries), docs, etc.
>>> Everything else is priority 2.
>>>
>>> That something isn’t priority 1 doesn’t mean it isn’t important or
>>> progressing, just that it isn’t the current focus. I think of it this way:
>>> if someone has extra time to review something, what should be next? That’s
>>> top priority.
>>>
>>> Here’s my rough categorization. If you disagree, please speak up:
>>>
>>>    - If you think that something should be top priority, what gets
>>>    moved to priority 2?
>>>    - Should the priority for a project in 2 or 3 change?
>>>    - Is the S/M/L size of a project wrong?
>>>
>>> Top priority, 1:
>>>
>>>    - API: Iceberg 1.0 [medium]
>>>    - Spark: Merge-on-read plans [large]
>>>    - Maintenance: Delete file compaction [medium]
>>>    -
>>>
>>>    Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>>>    -
>>>
>>>    Python: Pythonic refactor [medium]
>>>
>>> Priority 2:
>>>
>>>    - ORC: Support delete files stored as ORC [small]
>>>    - Spark: DSv2 streaming improvements [small]
>>>    - Flink: Inline file compaction [small]
>>>    - Flink: Support UPSERT [small]
>>>    - Views: Spec [medium]
>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>    - Spec: Snapshot tagging and branching [small]
>>>    - Spec: Secondary indexes [large]
>>>    - Spec v3: Encryption [large]
>>>    -
>>>
>>>    Spec v3: Relative paths [large]
>>>    -
>>>
>>>    Spec v3: Default field values [medium]
>>>
>>> Priority 3:
>>>
>>>    - Docs: versioned docs [medium]
>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>    - IO: Support Dell ECS [medium]
>>>
>>> External:
>>>
>>>    - Trino: Bucketed joins [small]
>>>    - Trino: Row-level delete support [medium]
>>>    - Trino: Merge-on-read plans [medium]
>>>    - Trino: Multi-catalog support [small]
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

-- 
Ryan Blue
Tabular

Re: [DISCUSS] Iceberg roadmap

Reply via email to