Szehon, I'm not sure that I'd consider either of those for the roadmap. While they are important to work on, they seem more like individual PRs than high-level projects that might span PRs. This is definitely a good thing to consider, though. Should we go more granular than the list that we've compiled?
On Fri, Sep 10, 2021 at 5:07 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > Hi > > I also missed the last sync, and wanted to add two things if possible. > > Thanks, > Szehon > > Priority 2: > > - Core: Predicate pushdown for remaining Metadata tables [medium] > - Core/Spark: Support serializable isolation for ReplacePartitions / > Insert Overwrite [medium] > > > On Fri, Sep 10, 2021 at 4:40 PM Steven Wu <stevenz...@gmail.com> wrote: > >> I would like to add a item >> >> Priority 2: >> Flink: FLIP-27 based Iceberg source [large] >> >> On Fri, Sep 10, 2021 at 2:38 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Hi everyone, >>> >>> At the last sync meeting, we brought up publishing a community roadmap >>> and brainstormed the many features and initiatives that the community is >>> working on. In this thread, I want to make sure that we have a good list of >>> what people are thinking about and I think we should try to categorize the >>> projects by size and general priority. When we reach a rough agreement, >>> I’ll write this up and post it on the ASF site along with links to some >>> projects in Github. >>> >>> My rationale for attempting to prioritize projects is that if we try to >>> do too many things, it will be slower progress across everything rather >>> than getting a few important items done. I know that priorities don’t align >>> very cleanly in practice, but it is hopefully worth trying. To come up with >>> a priority, I’m trying to keep top priority items to a minimum by including >>> only one from each group (Spark, Flink, Python, etc.). The remaining items >>> are split between priority 2 and 3. Priority 3 is not urgent, including >>> things that can be plugged in (like other IO libraries), docs, etc. >>> Everything else is priority 2. >>> >>> That something isn’t priority 1 doesn’t mean it isn’t important or >>> progressing, just that it isn’t the current focus. I think of it this way: >>> if someone has extra time to review something, what should be next? That’s >>> top priority. >>> >>> Here’s my rough categorization. If you disagree, please speak up: >>> >>> - If you think that something should be top priority, what gets >>> moved to priority 2? >>> - Should the priority for a project in 2 or 3 change? >>> - Is the S/M/L size of a project wrong? >>> >>> Top priority, 1: >>> >>> - API: Iceberg 1.0 [medium] >>> - Spark: Merge-on-read plans [large] >>> - Maintenance: Delete file compaction [medium] >>> - >>> >>> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >>> - >>> >>> Python: Pythonic refactor [medium] >>> >>> Priority 2: >>> >>> - ORC: Support delete files stored as ORC [small] >>> - Spark: DSv2 streaming improvements [small] >>> - Flink: Inline file compaction [small] >>> - Flink: Support UPSERT [small] >>> - Views: Spec [medium] >>> - Spec: Z-ordering / Space-filling curves [medium] >>> - Spec: Snapshot tagging and branching [small] >>> - Spec: Secondary indexes [large] >>> - Spec v3: Encryption [large] >>> - >>> >>> Spec v3: Relative paths [large] >>> - >>> >>> Spec v3: Default field values [medium] >>> >>> Priority 3: >>> >>> - Docs: versioned docs [medium] >>> - IO: Support Aliyun OSS/DLF [medium] >>> - IO: Support Dell ECS [medium] >>> >>> External: >>> >>> - Trino: Bucketed joins [small] >>> - Trino: Row-level delete support [medium] >>> - Trino: Merge-on-read plans [medium] >>> - Trino: Multi-catalog support [small] >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> -- Ryan Blue Tabular