Re: [DISCUSS] Iceberg roadmap

Gidon Gershinsky Mon, 13 Sep 2021 07:13:58 -0700

Hi Ryan,

I just wonder if the encryption should be a Spec v3 category. We have the
key_metadata fields in both data_file and manifest_file structs, which
might be sufficient for a reasonable basic encryption support.
But I certainly agree this is an L-sized project.


Cheers, Gidon


On Sat, Sep 11, 2021 at 12:38 AM Ryan Blue <[email protected]> wrote:

> Hi everyone,
>
> At the last sync meeting, we brought up publishing a community roadmap and
> brainstormed the many features and initiatives that the community is
> working on. In this thread, I want to make sure that we have a good list of
> what people are thinking about and I think we should try to categorize the
> projects by size and general priority. When we reach a rough agreement,
> I’ll write this up and post it on the ASF site along with links to some
> projects in Github.
>
> My rationale for attempting to prioritize projects is that if we try to do
> too many things, it will be slower progress across everything rather than
> getting a few important items done. I know that priorities don’t align very
> cleanly in practice, but it is hopefully worth trying. To come up with a
> priority, I’m trying to keep top priority items to a minimum by including
> only one from each group (Spark, Flink, Python, etc.). The remaining items
> are split between priority 2 and 3. Priority 3 is not urgent, including
> things that can be plugged in (like other IO libraries), docs, etc.
> Everything else is priority 2.
>
> That something isn’t priority 1 doesn’t mean it isn’t important or
> progressing, just that it isn’t the current focus. I think of it this way:
> if someone has extra time to review something, what should be next? That’s
> top priority.
>
> Here’s my rough categorization. If you disagree, please speak up:
>
>    - If you think that something should be top priority, what gets moved
>    to priority 2?
>    - Should the priority for a project in 2 or 3 change?
>    - Is the S/M/L size of a project wrong?
>
> Top priority, 1:
>
>    - API: Iceberg 1.0 [medium]
>    - Spark: Merge-on-read plans [large]
>    - Maintenance: Delete file compaction [medium]
>    -
>
>    Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>    -
>
>    Python: Pythonic refactor [medium]
>
> Priority 2:
>
>    - ORC: Support delete files stored as ORC [small]
>    - Spark: DSv2 streaming improvements [small]
>    - Flink: Inline file compaction [small]
>    - Flink: Support UPSERT [small]
>    - Views: Spec [medium]
>    - Spec: Z-ordering / Space-filling curves [medium]
>    - Spec: Snapshot tagging and branching [small]
>    - Spec: Secondary indexes [large]
>    - Spec v3: Encryption [large]
>    -
>
>    Spec v3: Relative paths [large]
>    -
>
>    Spec v3: Default field values [medium]
>
> Priority 3:
>
>    - Docs: versioned docs [medium]
>    - IO: Support Aliyun OSS/DLF [medium]
>    - IO: Support Dell ECS [medium]
>
> External:
>
>    - Trino: Bucketed joins [small]
>    - Trino: Row-level delete support [medium]
>    - Trino: Merge-on-read plans [medium]
>    - Trino: Multi-catalog support [small]
>
> --
> Ryan Blue
> Tabular
>

Re: [DISCUSS] Iceberg roadmap

Reply via email to