That sounds great, thanks for taking that on Jack! On Wed, Sep 15, 2021 at 3:51 PM Jack Ye <yezhao...@gmail.com> wrote:
> For external Trino and PrestoDB tasks, I am thinking about creating one > Github project for Trino and another one for PrestoDB to manage all tasks > under them, adding links of issues and PRs in the other communities to > track progress. This is mostly to improve visibility so that people who are > interested can see what is going on in those 2 places. > > -Jack Ye > > On Wed, Sep 15, 2021 at 2:14 PM Ryan Blue <b...@tabular.io> wrote: > >> Gidon, I think that the v3 part of encryption is actually documenting how >> it works and adding it to the spec. Right now we have hooks for building >> some encryption around it, but almost no requirements in the spec for how >> to use it across implementations. This is fine while we're working on >> defining encryption, but we eventually want to update the spec. >> >> Jack, I'm happy to add the external PrestoDB items to the roadmap. I'm >> just not quite sure what to do here since we aren't tracking them in the >> Iceberg community ourselves. I listed those as external so that we can >> publish links to where those are tracked in other communities. We can add >> as many of these as we want. >> >> Anton, I agree. The goal here is to identify the top priority items to >> help direct review effort. We want everything to continue progressing, but >> I think it's good to identify where we as a community want to focus review >> time. >> >> Sounds like one area of uncertainty is FLIP-27 vs Flink 1.13.2. Can >> someone summarize the status of Flink and what we need? I don't think I >> understand it well enough to suggest which one takes priority. >> >> Ryan >> >> On Mon, Sep 13, 2021 at 7:54 PM Anton Okolnychyi >> <aokolnyc...@apple.com.invalid> wrote: >> >>> The discussed roadmap makes sense to me. I think it is important to >>> agree on what we should do first as the review pool is limited. There are >>> more and more large items that are half done or half discussed. I think we >>> better focus on finishing them quickly and then move to something else as >>> opposed to making very minor progress on a number of issues. >>> >>> To be clear, it is not like other things are not important or we should >>> stop their development. It is more about making sure certain high-priority >>> features for most folks in the community get enough attention. >>> >>> - Anton >>> >>> On 13 Sep 2021, at 12:19, Jack Ye <yezhao...@gmail.com> wrote: >>> >>> I'd like to also propose adding the following in the external section: >>> 1. the PrestoDB equivalent for each item listed for Trino. I am not sure >>> what's the best way to track them, but I feel it's better to list and track >>> them separately. I have talked with related people currently maintaining >>> the PrestoDB Iceberg connector (mostly in Twitter), and they would like to >>> take a different route from Trino to fully remove Hive dependencies in the >>> connector. This means the 2 connectors will likely diverge in >>> implementation in the near future. >>> 2. adding a medium item for Trino and PrestoDB Avro support >>> 3. adding a small item for Trino and PrestoDB full system table support >>> (the system table schema in them are diverging from core, and missing a few >>> latest system tables) >>> >>> For the items listed with "Spec" and "Spec v3", what are the key >>> differences? I thought we are treating any new spec changes after the >>> format v2 vote as v3. >>> >>> Best, >>> Jack Ye >>> >>> On Mon, Sep 13, 2021 at 7:13 AM Gidon Gershinsky <gg5...@gmail.com> >>> wrote: >>> >>>> Hi Ryan, >>>> >>>> I just wonder if the encryption should be a Spec v3 category. We have >>>> the key_metadata fields in both data_file and manifest_file structs, which >>>> might be sufficient for a reasonable basic encryption support. >>>> But I certainly agree this is an L-sized project. >>>> >>>> Cheers, Gidon >>>> >>>> >>>> On Sat, Sep 11, 2021 at 12:38 AM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> At the last sync meeting, we brought up publishing a community roadmap >>>>> and brainstormed the many features and initiatives that the community is >>>>> working on. In this thread, I want to make sure that we have a good list >>>>> of >>>>> what people are thinking about and I think we should try to categorize the >>>>> projects by size and general priority. When we reach a rough agreement, >>>>> I’ll write this up and post it on the ASF site along with links to some >>>>> projects in Github. >>>>> >>>>> My rationale for attempting to prioritize projects is that if we try >>>>> to do too many things, it will be slower progress across everything rather >>>>> than getting a few important items done. I know that priorities don’t >>>>> align >>>>> very cleanly in practice, but it is hopefully worth trying. To come up >>>>> with >>>>> a priority, I’m trying to keep top priority items to a minimum by >>>>> including >>>>> only one from each group (Spark, Flink, Python, etc.). The remaining items >>>>> are split between priority 2 and 3. Priority 3 is not urgent, including >>>>> things that can be plugged in (like other IO libraries), docs, etc. >>>>> Everything else is priority 2. >>>>> >>>>> That something isn’t priority 1 doesn’t mean it isn’t important or >>>>> progressing, just that it isn’t the current focus. I think of it this way: >>>>> if someone has extra time to review something, what should be next? That’s >>>>> top priority. >>>>> >>>>> Here’s my rough categorization. If you disagree, please speak up: >>>>> >>>>> - If you think that something should be top priority, what gets >>>>> moved to priority 2? >>>>> - Should the priority for a project in 2 or 3 change? >>>>> - Is the S/M/L size of a project wrong? >>>>> >>>>> Top priority, 1: >>>>> >>>>> - API: Iceberg 1.0 [medium] >>>>> - Spark: Merge-on-read plans [large] >>>>> - Maintenance: Delete file compaction [medium] >>>>> - Flink: Upgrade to 1.13.2 (document compatibility) [medium] >>>>> - Python: Pythonic refactor [medium] >>>>> >>>>> Priority 2: >>>>> >>>>> - ORC: Support delete files stored as ORC [small] >>>>> - Spark: DSv2 streaming improvements [small] >>>>> - Flink: Inline file compaction [small] >>>>> - Flink: Support UPSERT [small] >>>>> - Views: Spec [medium] >>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>> - Spec: Snapshot tagging and branching [small] >>>>> - Spec: Secondary indexes [large] >>>>> - Spec v3: Encryption [large] >>>>> - Spec v3: Relative paths [large] >>>>> - Spec v3: Default field values [medium] >>>>> >>>>> Priority 3: >>>>> >>>>> - Docs: versioned docs [medium] >>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>> - IO: Support Dell ECS [medium] >>>>> >>>>> External: >>>>> >>>>> - Trino: Bucketed joins [small] >>>>> - Trino: Row-level delete support [medium] >>>>> - Trino: Merge-on-read plans [medium] >>>>> - Trino: Multi-catalog support [small] >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular