Thanks Ryan. We will keep a close eye on what is happening in the iceberg community and seek help when necessary.
Thanks, Zhao Chun Ryan Blue <b...@tabular.io> 于2021年11月10日周三 上午8:54写道: > Thanks, Zhao. I think those are great ways to work together. Let us know > how we can help you make StarRocks successful with Iceberg as its data > format. We're always happy to help people understand how Iceberg works and > improve our docs on how to use it. > > Ryan > > On Mon, Nov 8, 2021 at 8:17 PM Zhao Chun <zh...@apache.org> wrote: > >> I feel that Ryan's response exemplifies the generosity of an Apache >> project creator, >> a quality that has touched and benefited us. We look forward to >> contributing >> further to the Apache project in the future. >> As for the need for an issue to track progress,I don't think so for now. >> At the moment the main development work is done in the StarRocks >> repository. >> As for further cooperation in the future, I think there are several >> aspects. >> 1. StarRocks will be trying to support Iceberg. >> I think this will help StarRocks to re-examine how it integrates with the >> lakehouse system >> and we will be happy to feed back to the Apache Iceberg community the >> issues and benefits >> we encounter during the integration process. >> This will also validate the versatility of the iceberg project to support >> more query engines. >> I think this project will benefit both projects. >> 2. In the future, we will share some of our best practices for iceberg >> and StarRocks integration in a blog or talk. >> If the Apache Iceberg project feels that these blogs or talks would be >> beneficial to the Apache iceberg community, >> please consider linking our subsequent blogs or talks to the apache >> iceberg website blog. >> The Iceberg community can, of course, not link if they feel it is >> inappropriate. >> 3. we expect to contribute to the Apache Iceberg community under the >> Apache License V2. >> >> Thanks, >> Zhao Chun >> >> >> Ryan Blue <b...@tabular.io> 于2021年11月9日周二 上午3:05写道: >> >>> I think it is great to see another processing engine adding support for >>> Apache Iceberg, and I do look forward to collaborating with the StarRocks >>> community in the future. >>> >>> I'm not entirely sure what that collaboration would look like just yet >>> though. For most processing engines, it is people joining the Apache >>> Iceberg community. No matter what the license of the downstream project, we >>> always welcome more people contributing here! >>> >>> As for opening a project in our tracker, I'm not sure it makes sense to >>> do that just yet. As far as I know there aren't any issues to track there. >>> And would the StarRocks community find it helpful? >>> >>> On Mon, Nov 8, 2021 at 12:14 AM Zhao Chun <buaa.zh...@gmail.com> wrote: >>> >>>> Thanks to @OpenInx for mentioning StarRocks in the iceberg community. >>>> >>>> I'm from the StarRocks community. >>>> >>>> StarRocks is based on the Apache Doris project. >>>> It has been in development internally for almost two years and is >>>> currently used by hundreds of companies. >>>> It was just opened 2 months ago. >>>> >>>> Iceberg is a great project that makes huge datasets analysis more >>>> convenient. >>>> The StarRocks community is planning to support the iceberg engine. >>>> This will provide StarRocks users with the ability to analyze data in >>>> iceberg. >>>> >>>> Regarding the license, StarRocks' ELv2 will not affect our contribution >>>> to the iceberg community under the Apache License V2. >>>> >>>> We are also looking forward to receiving help from the iceberg >>>> community and will be contributing back to the iceberg community. >>>> >>>> Thanks, >>>> Zhao Chun >>>> >>>> >>>> Kyle Bendickson <k...@tabular.io> 于2021年11月8日周一 下午2:53写道: >>>> >>>>> +1 around concerns with the Elastic license. >>>>> >>>>> Also, more importantly, how important is integration with either of >>>>> these tools to the Iceberg community and contributors? >>>>> >>>>> The Elastic license makes a bit more sense for elasticsearch, as it >>>>> was an existing project for quite some time. I won’t reiterate the details >>>>> of that situation, but it’s odd to see a fork of a new, active project >>>>> using the Elastic license in my opinion. >>>>> >>>>> StarRocks admits that they’re at least 40% of code from the Apache >>>>> Doris project. >>>>> >>>>> That said, StarRocks claims to not require other dependencies. It >>>>> seems StarRocks supports query federation with a few tools so as not to >>>>> have to import the data and query those systems directly. So I’m not sure >>>>> what Iceberg support would look like beyond additional query federation. >>>>> What benefit does this provide? >>>>> >>>>> If we determined that integration with one of these tools was >>>>> something the community valued, could a connector be built to target the >>>>> Apache Doris project and then StarRocks could fork that code if they >>>>> liked? >>>>> >>>>> - Kyle Bendickson >>>>> GitHub @kbendick >>>>> >>>>> >>>>> >>>>> On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote: >>>>> >>>>>> +1, I have the same concern for the incompatible license. >>>>>> >>>>>> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道: >>>>>> >>>>>>> A few additional observations about StarRocks... >>>>>>> >>>>>>> - As far as I can tell, StarRocks has an ASF incompatible license >>>>>>> (Elastic License 2.0). >>>>>>> - It appears to be a hard fork of Apache Doris, a project still in >>>>>>> the incubator (and looks like it probably is destructive to the Doris >>>>>>> project) >>>>>>> - The project has only existed for ~2 months. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote: >>>>>>> >>>>>>>> Any thoughts for adding StarRocks integration to the roadmap ? >>>>>>>> >>>>>>>> I think the guys from StarRocks community can provide more >>>>>>>> background and inputs. >>>>>>>> >>>>>>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Update: >>>>>>>>> >>>>>>>>> StarRocks[1] is a next-gen sub-second MPP database for full >>>>>>>>> analysis scenarios, including multi-dimensional analytics, real-time >>>>>>>>> analytics and ad-hoc query. Their team is planning to integrate >>>>>>>>> iceberg >>>>>>>>> tables as StarRocks external tables in the next month [2], so that >>>>>>>>> people >>>>>>>>> could connect the data lake and StarRocks warehouse in the same >>>>>>>>> engine. >>>>>>>>> The excellent performance of StarRocks will also help accelerate >>>>>>>>> the analysis and access of the iceberg table, I think this is a great >>>>>>>>> thing >>>>>>>>> for both the iceberg community and the StarRocks community. I think >>>>>>>>> we >>>>>>>>> can add an extra project about StarRocks integration work in the >>>>>>>>> apache >>>>>>>>> iceberg roadmap [3] ? >>>>>>>>> >>>>>>>>> [1]. https://github.com/StarRocks/starrocks >>>>>>>>> [2]. https://github.com/StarRocks/starrocks/issues/1030 >>>>>>>>> [3]. https://github.com/apache/iceberg/projects >>>>>>>>> >>>>>>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>>>> >>>>>>>>>> I closed the upgrade project and marked the FLIP-27 project >>>>>>>>>> priority 1. Thanks for all the work to get this done! >>>>>>>>>> >>>>>>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Update: >>>>>>>>>>> >>>>>>>>>>> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap >>>>>>>>>>> can be closed now, because all of the issues have been addressed. >>>>>>>>>>> >>>>>>>>>>> [1]. https://github.com/apache/iceberg/projects/12 >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner < >>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I created a Roadmap section in >>>>>>>>>>>> https://github.com/apache/iceberg/pull/3163 >>>>>>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to >>>>>>>>>>>> the planning boards that Jack created. I figured it makes sense if >>>>>>>>>>>> we link >>>>>>>>>>>> available Design Docs directly on those Boards (as was already >>>>>>>>>>>> done), >>>>>>>>>>>> because then the Design docs are closer to the set of related >>>>>>>>>>>> issues. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, Jack! >>>>>>>>>>>>> >>>>>>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap >>>>>>>>>>>>> page as well that links to the projects that Jack just created. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> It seems like we have reached some consensus around the >>>>>>>>>>>>>> projects listed here. I have created corresponding Github >>>>>>>>>>>>>> projects for >>>>>>>>>>>>>> each: https://github.com/apache/iceberg/projects >>>>>>>>>>>>>> >>>>>>>>>>>>>> Related design docs are also linked there. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner < >>>>>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Would it make sense to have a section on the website where >>>>>>>>>>>>>>> we collect all the links to the design docs/specs as that would >>>>>>>>>>>>>>> be easier >>>>>>>>>>>>>>> to find than searching for things on the ML? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was thinking about something like for each component: >>>>>>>>>>>>>>> * link to the ML discussion >>>>>>>>>>>>>>> * link to the actual Spec/Design Doc >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> At the last sync meeting, we brought up publishing a >>>>>>>>>>>>>>>> community roadmap and brainstormed the many features and >>>>>>>>>>>>>>>> initiatives that >>>>>>>>>>>>>>>> the community is working on. In this thread, I want to make >>>>>>>>>>>>>>>> sure that we >>>>>>>>>>>>>>>> have a good list of what people are thinking about and I think >>>>>>>>>>>>>>>> we should >>>>>>>>>>>>>>>> try to categorize the projects by size and general priority. >>>>>>>>>>>>>>>> When we reach >>>>>>>>>>>>>>>> a rough agreement, I’ll write this up and post it on the ASF >>>>>>>>>>>>>>>> site along >>>>>>>>>>>>>>>> with links to some projects in Github. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My rationale for attempting to prioritize projects is that >>>>>>>>>>>>>>>> if we try to do too many things, it will be slower progress >>>>>>>>>>>>>>>> across >>>>>>>>>>>>>>>> everything rather than getting a few important items done. I >>>>>>>>>>>>>>>> know that >>>>>>>>>>>>>>>> priorities don’t align very cleanly in practice, but it is >>>>>>>>>>>>>>>> hopefully worth >>>>>>>>>>>>>>>> trying. To come up with a priority, I’m trying to keep top >>>>>>>>>>>>>>>> priority items >>>>>>>>>>>>>>>> to a minimum by including only one from each group (Spark, >>>>>>>>>>>>>>>> Flink, Python, >>>>>>>>>>>>>>>> etc.). The remaining items are split between priority 2 and 3. >>>>>>>>>>>>>>>> Priority 3 >>>>>>>>>>>>>>>> is not urgent, including things that can be plugged in (like >>>>>>>>>>>>>>>> other IO >>>>>>>>>>>>>>>> libraries), docs, etc. Everything else is priority 2. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t >>>>>>>>>>>>>>>> important or progressing, just that it isn’t the current >>>>>>>>>>>>>>>> focus. I think of >>>>>>>>>>>>>>>> it this way: if someone has extra time to review something, >>>>>>>>>>>>>>>> what should be >>>>>>>>>>>>>>>> next? That’s top priority. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here’s my rough categorization. If you disagree, please >>>>>>>>>>>>>>>> speak up: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - If you think that something should be top priority, >>>>>>>>>>>>>>>> what gets moved to priority 2? >>>>>>>>>>>>>>>> - Should the priority for a project in 2 or 3 change? >>>>>>>>>>>>>>>> - Is the S/M/L size of a project wrong? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Top priority, 1: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - API: Iceberg 1.0 [medium] >>>>>>>>>>>>>>>> - Spark: Merge-on-read plans [large] >>>>>>>>>>>>>>>> - Maintenance: Delete file compaction [medium] >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Flink: Upgrade to 1.13.2 (document compatibility) >>>>>>>>>>>>>>>> [medium] >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Python: Pythonic refactor [medium] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Priority 2: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - ORC: Support delete files stored as ORC [small] >>>>>>>>>>>>>>>> - Spark: DSv2 streaming improvements [small] >>>>>>>>>>>>>>>> - Flink: Inline file compaction [small] >>>>>>>>>>>>>>>> - Flink: Support UPSERT [small] >>>>>>>>>>>>>>>> - Views: Spec [medium] >>>>>>>>>>>>>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>>>>>>>>>>>>> - Spec: Snapshot tagging and branching [small] >>>>>>>>>>>>>>>> - Spec: Secondary indexes [large] >>>>>>>>>>>>>>>> - Spec v3: Encryption [large] >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Spec v3: Relative paths [large] >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Spec v3: Default field values [medium] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Priority 3: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Docs: versioned docs [medium] >>>>>>>>>>>>>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>>>>>>>>>>>>> - IO: Support Dell ECS [medium] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> External: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Trino: Bucketed joins [small] >>>>>>>>>>>>>>>> - Trino: Row-level delete support [medium] >>>>>>>>>>>>>>>> - Trino: Merge-on-read plans [medium] >>>>>>>>>>>>>>>> - Trino: Multi-catalog support [small] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>> Tabular >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ryan Blue >>>>>>>>>> Tabular >>>>>>>>>> >>>>>>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >