Thanks Ryan.
We will keep a close eye on what is happening in the iceberg community and
seek help when necessary.

Thanks,
Zhao Chun


Ryan Blue <b...@tabular.io> 于2021年11月10日周三 上午8:54写道:

> Thanks, Zhao. I think those are great ways to work together. Let us know
> how we can help you make StarRocks successful with Iceberg as its data
> format. We're always happy to help people understand how Iceberg works and
> improve our docs on how to use it.
>
> Ryan
>
> On Mon, Nov 8, 2021 at 8:17 PM Zhao Chun <zh...@apache.org> wrote:
>
>> I feel that Ryan's response exemplifies the generosity of an Apache
>> project creator,
>> a quality that has touched and benefited us. We look forward to
>> contributing
>> further to the Apache project in the future.
>> As for the need for an issue to track progress,I don't think so for now.
>> At the moment the main development work is done in the StarRocks
>> repository.
>> As for further cooperation in the future, I think there are several
>> aspects.
>> 1. StarRocks will be trying to support Iceberg.
>> I think this will help StarRocks to re-examine how it integrates with the
>> lakehouse system
>> and we will be happy to feed back to the Apache Iceberg community the
>> issues and benefits
>> we encounter during the integration process.
>> This will also validate the versatility of the iceberg project to support
>> more query engines.
>> I think this project will benefit both projects.
>> 2. In the future, we will share some of our best practices for iceberg
>> and StarRocks integration in a blog or talk.
>> If the Apache Iceberg project feels that these blogs or talks would be
>> beneficial to the Apache iceberg community,
>> please consider linking our subsequent blogs or talks to the apache
>> iceberg website blog.
>> The Iceberg community can, of course, not link if they feel it is
>> inappropriate.
>> 3. we expect to contribute to the Apache Iceberg community under the
>> Apache License V2.
>>
>> Thanks,
>> Zhao Chun
>>
>>
>> Ryan Blue <b...@tabular.io> 于2021年11月9日周二 上午3:05写道:
>>
>>> I think it is great to see another processing engine adding support for
>>> Apache Iceberg, and I do look forward to collaborating with the StarRocks
>>> community in the future.
>>>
>>> I'm not entirely sure what that collaboration would look like just yet
>>> though. For most processing engines, it is people joining the Apache
>>> Iceberg community. No matter what the license of the downstream project, we
>>> always welcome more people contributing here!
>>>
>>> As for opening a project in our tracker, I'm not sure it makes sense to
>>> do that just yet. As far as I know there aren't any issues to track there.
>>> And would the StarRocks community find it helpful?
>>>
>>> On Mon, Nov 8, 2021 at 12:14 AM Zhao Chun <buaa.zh...@gmail.com> wrote:
>>>
>>>> Thanks to @OpenInx for mentioning StarRocks in the iceberg community.
>>>>
>>>> I'm from the StarRocks community.
>>>>
>>>> StarRocks is based on the Apache Doris project.
>>>> It has been in development internally for almost two years and is
>>>> currently used by hundreds of companies.
>>>> It was just opened 2 months ago.
>>>>
>>>> Iceberg is a great project that makes huge datasets analysis more
>>>> convenient.
>>>> The StarRocks community is planning to support the iceberg engine.
>>>> This will provide StarRocks users with the ability to analyze data in
>>>> iceberg.
>>>>
>>>> Regarding the license, StarRocks' ELv2 will not affect our contribution
>>>> to the iceberg community under the Apache License V2.
>>>>
>>>> We are also looking forward to receiving help from the iceberg
>>>> community and will be contributing back to the iceberg community.
>>>>
>>>> Thanks,
>>>> Zhao Chun
>>>>
>>>>
>>>> Kyle Bendickson <k...@tabular.io> 于2021年11月8日周一 下午2:53写道:
>>>>
>>>>> +1 around concerns with the Elastic license.
>>>>>
>>>>> Also, more importantly, how important is integration with either of
>>>>> these tools to the Iceberg community and contributors?
>>>>>
>>>>> The Elastic license makes a bit more sense for elasticsearch, as it
>>>>> was an existing project for quite some time. I won’t reiterate the details
>>>>> of that situation, but it’s odd to see a fork of a new, active project
>>>>> using the Elastic license in my opinion.
>>>>>
>>>>> StarRocks admits that they’re at least 40% of code from the Apache
>>>>> Doris project.
>>>>>
>>>>> That said, StarRocks claims to not require other dependencies. It
>>>>> seems StarRocks supports query federation with a few tools so as not to
>>>>> have to import the data and query those systems directly. So I’m not sure
>>>>> what Iceberg support would look like beyond additional query federation.
>>>>> What benefit does this provide?
>>>>>
>>>>> If we determined that integration with one of these tools was
>>>>> something the community valued, could a connector be built to target the
>>>>> Apache Doris project and then StarRocks could fork that code if they 
>>>>> liked?
>>>>>
>>>>> - Kyle Bendickson
>>>>> GitHub @kbendick
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote:
>>>>>
>>>>>> +1, I have the same concern for the incompatible license.
>>>>>>
>>>>>> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道:
>>>>>>
>>>>>>> A few additional observations about StarRocks...
>>>>>>>
>>>>>>> - As far as I can tell, StarRocks has an ASF incompatible license
>>>>>>> (Elastic License 2.0).
>>>>>>> - It appears to be a hard fork of Apache Doris, a project still in
>>>>>>> the incubator (and looks like it probably is destructive to the Doris
>>>>>>> project)
>>>>>>> - The project has only existed for ~2 months.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Any thoughts for adding StarRocks integration to the roadmap ?
>>>>>>>>
>>>>>>>> I think the guys from StarRocks community can provide more
>>>>>>>> background and inputs.
>>>>>>>>
>>>>>>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Update:
>>>>>>>>>
>>>>>>>>> StarRocks[1] is a next-gen sub-second MPP database for full
>>>>>>>>> analysis scenarios, including multi-dimensional analytics, real-time
>>>>>>>>> analytics and ad-hoc query.  Their team is planning to integrate 
>>>>>>>>> iceberg
>>>>>>>>> tables as StarRocks external tables in the next month [2], so that 
>>>>>>>>> people
>>>>>>>>> could connect the data lake and StarRocks warehouse in the same 
>>>>>>>>> engine.
>>>>>>>>> The excellent performance of StarRocks will also help accelerate
>>>>>>>>> the analysis and access of the iceberg table, I think this is a great 
>>>>>>>>> thing
>>>>>>>>> for both the iceberg community and the StarRocks community.   I think 
>>>>>>>>> we
>>>>>>>>> can add an extra project about StarRocks integration work in the 
>>>>>>>>> apache
>>>>>>>>> iceberg roadmap [3] ?
>>>>>>>>>
>>>>>>>>> [1].  https://github.com/StarRocks/starrocks
>>>>>>>>> [2].  https://github.com/StarRocks/starrocks/issues/1030
>>>>>>>>> [3].  https://github.com/apache/iceberg/projects
>>>>>>>>>
>>>>>>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>>>>
>>>>>>>>>> I closed the upgrade project and marked the FLIP-27 project
>>>>>>>>>> priority 1. Thanks for all the work to get this done!
>>>>>>>>>>
>>>>>>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Update:
>>>>>>>>>>>
>>>>>>>>>>> I think the project  [Flink: Upgrade to 1.13.2][1] in RoadMap
>>>>>>>>>>> can be closed now, because all of the issues have been addressed.
>>>>>>>>>>>
>>>>>>>>>>> [1]. https://github.com/apache/iceberg/projects/12
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <
>>>>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I created a Roadmap section in
>>>>>>>>>>>>  https://github.com/apache/iceberg/pull/3163
>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to
>>>>>>>>>>>> the planning boards that Jack created. I figured it makes sense if 
>>>>>>>>>>>> we link
>>>>>>>>>>>> available Design Docs directly on those Boards (as was already 
>>>>>>>>>>>> done),
>>>>>>>>>>>> because then the Design docs are closer to the set of related 
>>>>>>>>>>>> issues.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Jack!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap
>>>>>>>>>>>>> page as well that links to the projects that Jack just created.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems like we have reached some consensus around the
>>>>>>>>>>>>>> projects listed here. I have created corresponding Github 
>>>>>>>>>>>>>> projects for
>>>>>>>>>>>>>> each: https://github.com/apache/iceberg/projects
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Related design docs are also linked there.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <
>>>>>>>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Would it make sense to have a section on the website where
>>>>>>>>>>>>>>> we collect all the links to the design docs/specs as that would 
>>>>>>>>>>>>>>> be easier
>>>>>>>>>>>>>>> to find than searching for things on the ML?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was thinking about something like for each component:
>>>>>>>>>>>>>>> * link to the ML discussion
>>>>>>>>>>>>>>> * link to the actual Spec/Design Doc
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> At the last sync meeting, we brought up publishing a
>>>>>>>>>>>>>>>> community roadmap and brainstormed the many features and 
>>>>>>>>>>>>>>>> initiatives that
>>>>>>>>>>>>>>>> the community is working on. In this thread, I want to make 
>>>>>>>>>>>>>>>> sure that we
>>>>>>>>>>>>>>>> have a good list of what people are thinking about and I think 
>>>>>>>>>>>>>>>> we should
>>>>>>>>>>>>>>>> try to categorize the projects by size and general priority. 
>>>>>>>>>>>>>>>> When we reach
>>>>>>>>>>>>>>>> a rough agreement, I’ll write this up and post it on the ASF 
>>>>>>>>>>>>>>>> site along
>>>>>>>>>>>>>>>> with links to some projects in Github.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My rationale for attempting to prioritize projects is that
>>>>>>>>>>>>>>>> if we try to do too many things, it will be slower progress 
>>>>>>>>>>>>>>>> across
>>>>>>>>>>>>>>>> everything rather than getting a few important items done. I 
>>>>>>>>>>>>>>>> know that
>>>>>>>>>>>>>>>> priorities don’t align very cleanly in practice, but it is 
>>>>>>>>>>>>>>>> hopefully worth
>>>>>>>>>>>>>>>> trying. To come up with a priority, I’m trying to keep top 
>>>>>>>>>>>>>>>> priority items
>>>>>>>>>>>>>>>> to a minimum by including only one from each group (Spark, 
>>>>>>>>>>>>>>>> Flink, Python,
>>>>>>>>>>>>>>>> etc.). The remaining items are split between priority 2 and 3. 
>>>>>>>>>>>>>>>> Priority 3
>>>>>>>>>>>>>>>> is not urgent, including things that can be plugged in (like 
>>>>>>>>>>>>>>>> other IO
>>>>>>>>>>>>>>>> libraries), docs, etc. Everything else is priority 2.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t
>>>>>>>>>>>>>>>> important or progressing, just that it isn’t the current 
>>>>>>>>>>>>>>>> focus. I think of
>>>>>>>>>>>>>>>> it this way: if someone has extra time to review something, 
>>>>>>>>>>>>>>>> what should be
>>>>>>>>>>>>>>>> next? That’s top priority.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here’s my rough categorization. If you disagree, please
>>>>>>>>>>>>>>>> speak up:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - If you think that something should be top priority,
>>>>>>>>>>>>>>>>    what gets moved to priority 2?
>>>>>>>>>>>>>>>>    - Should the priority for a project in 2 or 3 change?
>>>>>>>>>>>>>>>>    - Is the S/M/L size of a project wrong?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Top priority, 1:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - API: Iceberg 1.0 [medium]
>>>>>>>>>>>>>>>>    - Spark: Merge-on-read plans [large]
>>>>>>>>>>>>>>>>    - Maintenance: Delete file compaction [medium]
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Flink: Upgrade to 1.13.2 (document compatibility)
>>>>>>>>>>>>>>>>    [medium]
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Python: Pythonic refactor [medium]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Priority 2:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - ORC: Support delete files stored as ORC [small]
>>>>>>>>>>>>>>>>    - Spark: DSv2 streaming improvements [small]
>>>>>>>>>>>>>>>>    - Flink: Inline file compaction [small]
>>>>>>>>>>>>>>>>    - Flink: Support UPSERT [small]
>>>>>>>>>>>>>>>>    - Views: Spec [medium]
>>>>>>>>>>>>>>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>>>>>>>>>>>>>>    - Spec: Snapshot tagging and branching [small]
>>>>>>>>>>>>>>>>    - Spec: Secondary indexes [large]
>>>>>>>>>>>>>>>>    - Spec v3: Encryption [large]
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Spec v3: Relative paths [large]
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Spec v3: Default field values [medium]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Priority 3:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Docs: versioned docs [medium]
>>>>>>>>>>>>>>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>>>>>>>>>>>>>>    - IO: Support Dell ECS [medium]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> External:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Trino: Bucketed joins [small]
>>>>>>>>>>>>>>>>    - Trino: Row-level delete support [medium]
>>>>>>>>>>>>>>>>    - Trino: Merge-on-read plans [medium]
>>>>>>>>>>>>>>>>    - Trino: Multi-catalog support [small]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Tabular
>>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to