It seems like we have reached some consensus around the projects listed here. I have created corresponding Github projects for each: https://github.com/apache/iceberg/projects
Related design docs are also linked there. Best, Jack Ye On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <edu...@dremio.com> wrote: > Would it make sense to have a section on the website where we collect all > the links to the design docs/specs as that would be easier to find than > searching for things on the ML? > > I was thinking about something like for each component: > * link to the ML discussion > * link to the actual Spec/Design Doc > > Thoughts? > > On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> wrote: > >> Hi everyone, >> >> At the last sync meeting, we brought up publishing a community roadmap >> and brainstormed the many features and initiatives that the community is >> working on. In this thread, I want to make sure that we have a good list of >> what people are thinking about and I think we should try to categorize the >> projects by size and general priority. When we reach a rough agreement, >> I’ll write this up and post it on the ASF site along with links to some >> projects in Github. >> >> My rationale for attempting to prioritize projects is that if we try to >> do too many things, it will be slower progress across everything rather >> than getting a few important items done. I know that priorities don’t align >> very cleanly in practice, but it is hopefully worth trying. To come up with >> a priority, I’m trying to keep top priority items to a minimum by including >> only one from each group (Spark, Flink, Python, etc.). The remaining items >> are split between priority 2 and 3. Priority 3 is not urgent, including >> things that can be plugged in (like other IO libraries), docs, etc. >> Everything else is priority 2. >> >> That something isn’t priority 1 doesn’t mean it isn’t important or >> progressing, just that it isn’t the current focus. I think of it this way: >> if someone has extra time to review something, what should be next? That’s >> top priority. >> >> Here’s my rough categorization. If you disagree, please speak up: >> >> - If you think that something should be top priority, what gets moved >> to priority 2? >> - Should the priority for a project in 2 or 3 change? >> - Is the S/M/L size of a project wrong? >> >> Top priority, 1: >> >> - API: Iceberg 1.0 [medium] >> - Spark: Merge-on-read plans [large] >> - Maintenance: Delete file compaction [medium] >> - >> >> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >> - >> >> Python: Pythonic refactor [medium] >> >> Priority 2: >> >> - ORC: Support delete files stored as ORC [small] >> - Spark: DSv2 streaming improvements [small] >> - Flink: Inline file compaction [small] >> - Flink: Support UPSERT [small] >> - Views: Spec [medium] >> - Spec: Z-ordering / Space-filling curves [medium] >> - Spec: Snapshot tagging and branching [small] >> - Spec: Secondary indexes [large] >> - Spec v3: Encryption [large] >> - >> >> Spec v3: Relative paths [large] >> - >> >> Spec v3: Default field values [medium] >> >> Priority 3: >> >> - Docs: versioned docs [medium] >> - IO: Support Aliyun OSS/DLF [medium] >> - IO: Support Dell ECS [medium] >> >> External: >> >> - Trino: Bucketed joins [small] >> - Trino: Row-level delete support [medium] >> - Trino: Merge-on-read plans [medium] >> - Trino: Multi-catalog support [small] >> >> -- >> Ryan Blue >> Tabular >> >