Update: As the Dell EMC EcsFileIO has been merged into apache iceberg official repo, so I think it's okay to get this project from roadmap closed now: https://github.com/apache/iceberg/projects/22
Thanks. On Wed, Nov 10, 2021 at 10:22 AM Zhao Chun <zh...@apache.org> wrote: > Thanks Ryan. > We will keep a close eye on what is happening in the iceberg community and > seek help when necessary. > > Thanks, > Zhao Chun > > > Ryan Blue <b...@tabular.io> 于2021年11月10日周三 上午8:54写道: > >> Thanks, Zhao. I think those are great ways to work together. Let us know >> how we can help you make StarRocks successful with Iceberg as its data >> format. We're always happy to help people understand how Iceberg works and >> improve our docs on how to use it. >> >> Ryan >> >> On Mon, Nov 8, 2021 at 8:17 PM Zhao Chun <zh...@apache.org> wrote: >> >>> I feel that Ryan's response exemplifies the generosity of an Apache >>> project creator, >>> a quality that has touched and benefited us. We look forward to >>> contributing >>> further to the Apache project in the future. >>> As for the need for an issue to track progress,I don't think so for now. >>> At the moment the main development work is done in the StarRocks >>> repository. >>> As for further cooperation in the future, I think there are several >>> aspects. >>> 1. StarRocks will be trying to support Iceberg. >>> I think this will help StarRocks to re-examine how it integrates with >>> the lakehouse system >>> and we will be happy to feed back to the Apache Iceberg community the >>> issues and benefits >>> we encounter during the integration process. >>> This will also validate the versatility of the iceberg project to >>> support more query engines. >>> I think this project will benefit both projects. >>> 2. In the future, we will share some of our best practices for iceberg >>> and StarRocks integration in a blog or talk. >>> If the Apache Iceberg project feels that these blogs or talks would be >>> beneficial to the Apache iceberg community, >>> please consider linking our subsequent blogs or talks to the apache >>> iceberg website blog. >>> The Iceberg community can, of course, not link if they feel it is >>> inappropriate. >>> 3. we expect to contribute to the Apache Iceberg community under the >>> Apache License V2. >>> >>> Thanks, >>> Zhao Chun >>> >>> >>> Ryan Blue <b...@tabular.io> 于2021年11月9日周二 上午3:05写道: >>> >>>> I think it is great to see another processing engine adding support for >>>> Apache Iceberg, and I do look forward to collaborating with the StarRocks >>>> community in the future. >>>> >>>> I'm not entirely sure what that collaboration would look like just yet >>>> though. For most processing engines, it is people joining the Apache >>>> Iceberg community. No matter what the license of the downstream project, we >>>> always welcome more people contributing here! >>>> >>>> As for opening a project in our tracker, I'm not sure it makes sense to >>>> do that just yet. As far as I know there aren't any issues to track there. >>>> And would the StarRocks community find it helpful? >>>> >>>> On Mon, Nov 8, 2021 at 12:14 AM Zhao Chun <buaa.zh...@gmail.com> wrote: >>>> >>>>> Thanks to @OpenInx for mentioning StarRocks in the iceberg community. >>>>> >>>>> I'm from the StarRocks community. >>>>> >>>>> StarRocks is based on the Apache Doris project. >>>>> It has been in development internally for almost two years and is >>>>> currently used by hundreds of companies. >>>>> It was just opened 2 months ago. >>>>> >>>>> Iceberg is a great project that makes huge datasets analysis more >>>>> convenient. >>>>> The StarRocks community is planning to support the iceberg engine. >>>>> This will provide StarRocks users with the ability to analyze data in >>>>> iceberg. >>>>> >>>>> Regarding the license, StarRocks' ELv2 will not affect our >>>>> contribution to the iceberg community under the Apache License V2. >>>>> >>>>> We are also looking forward to receiving help from the iceberg >>>>> community and will be contributing back to the iceberg community. >>>>> >>>>> Thanks, >>>>> Zhao Chun >>>>> >>>>> >>>>> Kyle Bendickson <k...@tabular.io> 于2021年11月8日周一 下午2:53写道: >>>>> >>>>>> +1 around concerns with the Elastic license. >>>>>> >>>>>> Also, more importantly, how important is integration with either of >>>>>> these tools to the Iceberg community and contributors? >>>>>> >>>>>> The Elastic license makes a bit more sense for elasticsearch, as it >>>>>> was an existing project for quite some time. I won’t reiterate the >>>>>> details >>>>>> of that situation, but it’s odd to see a fork of a new, active project >>>>>> using the Elastic license in my opinion. >>>>>> >>>>>> StarRocks admits that they’re at least 40% of code from the Apache >>>>>> Doris project. >>>>>> >>>>>> That said, StarRocks claims to not require other dependencies. It >>>>>> seems StarRocks supports query federation with a few tools so as not to >>>>>> have to import the data and query those systems directly. So I’m not sure >>>>>> what Iceberg support would look like beyond additional query federation. >>>>>> What benefit does this provide? >>>>>> >>>>>> If we determined that integration with one of these tools was >>>>>> something the community valued, could a connector be built to target the >>>>>> Apache Doris project and then StarRocks could fork that code if they >>>>>> liked? >>>>>> >>>>>> - Kyle Bendickson >>>>>> GitHub @kbendick >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote: >>>>>> >>>>>>> +1, I have the same concern for the incompatible license. >>>>>>> >>>>>>> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道: >>>>>>> >>>>>>>> A few additional observations about StarRocks... >>>>>>>> >>>>>>>> - As far as I can tell, StarRocks has an ASF incompatible license >>>>>>>> (Elastic License 2.0). >>>>>>>> - It appears to be a hard fork of Apache Doris, a project still in >>>>>>>> the incubator (and looks like it probably is destructive to the Doris >>>>>>>> project) >>>>>>>> - The project has only existed for ~2 months. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Any thoughts for adding StarRocks integration to the roadmap ? >>>>>>>>> >>>>>>>>> I think the guys from StarRocks community can provide more >>>>>>>>> background and inputs. >>>>>>>>> >>>>>>>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Update: >>>>>>>>>> >>>>>>>>>> StarRocks[1] is a next-gen sub-second MPP database for full >>>>>>>>>> analysis scenarios, including multi-dimensional analytics, real-time >>>>>>>>>> analytics and ad-hoc query. Their team is planning to integrate >>>>>>>>>> iceberg >>>>>>>>>> tables as StarRocks external tables in the next month [2], so that >>>>>>>>>> people >>>>>>>>>> could connect the data lake and StarRocks warehouse in the same >>>>>>>>>> engine. >>>>>>>>>> The excellent performance of StarRocks will also help accelerate >>>>>>>>>> the analysis and access of the iceberg table, I think this is a >>>>>>>>>> great thing >>>>>>>>>> for both the iceberg community and the StarRocks community. I >>>>>>>>>> think we >>>>>>>>>> can add an extra project about StarRocks integration work in the >>>>>>>>>> apache >>>>>>>>>> iceberg roadmap [3] ? >>>>>>>>>> >>>>>>>>>> [1]. https://github.com/StarRocks/starrocks >>>>>>>>>> [2]. https://github.com/StarRocks/starrocks/issues/1030 >>>>>>>>>> [3]. https://github.com/apache/iceberg/projects >>>>>>>>>> >>>>>>>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I closed the upgrade project and marked the FLIP-27 project >>>>>>>>>>> priority 1. Thanks for all the work to get this done! >>>>>>>>>>> >>>>>>>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Update: >>>>>>>>>>>> >>>>>>>>>>>> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap >>>>>>>>>>>> can be closed now, because all of the issues have been addressed. >>>>>>>>>>>> >>>>>>>>>>>> [1]. https://github.com/apache/iceberg/projects/12 >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner < >>>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I created a Roadmap section in >>>>>>>>>>>>> https://github.com/apache/iceberg/pull/3163 >>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to >>>>>>>>>>>>> the planning boards that Jack created. I figured it makes sense >>>>>>>>>>>>> if we link >>>>>>>>>>>>> available Design Docs directly on those Boards (as was already >>>>>>>>>>>>> done), >>>>>>>>>>>>> because then the Design docs are closer to the set of related >>>>>>>>>>>>> issues. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Jack! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap >>>>>>>>>>>>>> page as well that links to the projects that Jack just created. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It seems like we have reached some consensus around the >>>>>>>>>>>>>>> projects listed here. I have created corresponding Github >>>>>>>>>>>>>>> projects for >>>>>>>>>>>>>>> each: https://github.com/apache/iceberg/projects >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Related design docs are also linked there. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner < >>>>>>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Would it make sense to have a section on the website where >>>>>>>>>>>>>>>> we collect all the links to the design docs/specs as that >>>>>>>>>>>>>>>> would be easier >>>>>>>>>>>>>>>> to find than searching for things on the ML? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was thinking about something like for each component: >>>>>>>>>>>>>>>> * link to the ML discussion >>>>>>>>>>>>>>>> * link to the actual Spec/Design Doc >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> At the last sync meeting, we brought up publishing a >>>>>>>>>>>>>>>>> community roadmap and brainstormed the many features and >>>>>>>>>>>>>>>>> initiatives that >>>>>>>>>>>>>>>>> the community is working on. In this thread, I want to make >>>>>>>>>>>>>>>>> sure that we >>>>>>>>>>>>>>>>> have a good list of what people are thinking about and I >>>>>>>>>>>>>>>>> think we should >>>>>>>>>>>>>>>>> try to categorize the projects by size and general priority. >>>>>>>>>>>>>>>>> When we reach >>>>>>>>>>>>>>>>> a rough agreement, I’ll write this up and post it on the ASF >>>>>>>>>>>>>>>>> site along >>>>>>>>>>>>>>>>> with links to some projects in Github. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My rationale for attempting to prioritize projects is that >>>>>>>>>>>>>>>>> if we try to do too many things, it will be slower progress >>>>>>>>>>>>>>>>> across >>>>>>>>>>>>>>>>> everything rather than getting a few important items done. I >>>>>>>>>>>>>>>>> know that >>>>>>>>>>>>>>>>> priorities don’t align very cleanly in practice, but it is >>>>>>>>>>>>>>>>> hopefully worth >>>>>>>>>>>>>>>>> trying. To come up with a priority, I’m trying to keep top >>>>>>>>>>>>>>>>> priority items >>>>>>>>>>>>>>>>> to a minimum by including only one from each group (Spark, >>>>>>>>>>>>>>>>> Flink, Python, >>>>>>>>>>>>>>>>> etc.). The remaining items are split between priority 2 and >>>>>>>>>>>>>>>>> 3. Priority 3 >>>>>>>>>>>>>>>>> is not urgent, including things that can be plugged in (like >>>>>>>>>>>>>>>>> other IO >>>>>>>>>>>>>>>>> libraries), docs, etc. Everything else is priority 2. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t >>>>>>>>>>>>>>>>> important or progressing, just that it isn’t the current >>>>>>>>>>>>>>>>> focus. I think of >>>>>>>>>>>>>>>>> it this way: if someone has extra time to review something, >>>>>>>>>>>>>>>>> what should be >>>>>>>>>>>>>>>>> next? That’s top priority. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here’s my rough categorization. If you disagree, please >>>>>>>>>>>>>>>>> speak up: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - If you think that something should be top priority, >>>>>>>>>>>>>>>>> what gets moved to priority 2? >>>>>>>>>>>>>>>>> - Should the priority for a project in 2 or 3 change? >>>>>>>>>>>>>>>>> - Is the S/M/L size of a project wrong? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Top priority, 1: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - API: Iceberg 1.0 [medium] >>>>>>>>>>>>>>>>> - Spark: Merge-on-read plans [large] >>>>>>>>>>>>>>>>> - Maintenance: Delete file compaction [medium] >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Flink: Upgrade to 1.13.2 (document compatibility) >>>>>>>>>>>>>>>>> [medium] >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Python: Pythonic refactor [medium] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Priority 2: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - ORC: Support delete files stored as ORC [small] >>>>>>>>>>>>>>>>> - Spark: DSv2 streaming improvements [small] >>>>>>>>>>>>>>>>> - Flink: Inline file compaction [small] >>>>>>>>>>>>>>>>> - Flink: Support UPSERT [small] >>>>>>>>>>>>>>>>> - Views: Spec [medium] >>>>>>>>>>>>>>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>>>>>>>>>>>>>> - Spec: Snapshot tagging and branching [small] >>>>>>>>>>>>>>>>> - Spec: Secondary indexes [large] >>>>>>>>>>>>>>>>> - Spec v3: Encryption [large] >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Spec v3: Relative paths [large] >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Spec v3: Default field values [medium] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Priority 3: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Docs: versioned docs [medium] >>>>>>>>>>>>>>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>>>>>>>>>>>>>> - IO: Support Dell ECS [medium] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> External: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Trino: Bucketed joins [small] >>>>>>>>>>>>>>>>> - Trino: Row-level delete support [medium] >>>>>>>>>>>>>>>>> - Trino: Merge-on-read plans [medium] >>>>>>>>>>>>>>>>> - Trino: Multi-catalog support [small] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> Tabular >>>>>>>>>>> >>>>>>>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> >