I think it is great to see another processing engine adding support for
Apache Iceberg, and I do look forward to collaborating with the StarRocks
community in the future.

I'm not entirely sure what that collaboration would look like just yet
though. For most processing engines, it is people joining the Apache
Iceberg community. No matter what the license of the downstream project, we
always welcome more people contributing here!

As for opening a project in our tracker, I'm not sure it makes sense to do
that just yet. As far as I know there aren't any issues to track there. And
would the StarRocks community find it helpful?

On Mon, Nov 8, 2021 at 12:14 AM Zhao Chun <buaa.zh...@gmail.com> wrote:

> Thanks to @OpenInx for mentioning StarRocks in the iceberg community.
>
> I'm from the StarRocks community.
>
> StarRocks is based on the Apache Doris project.
> It has been in development internally for almost two years and is
> currently used by hundreds of companies.
> It was just opened 2 months ago.
>
> Iceberg is a great project that makes huge datasets analysis more
> convenient.
> The StarRocks community is planning to support the iceberg engine.
> This will provide StarRocks users with the ability to analyze data in
> iceberg.
>
> Regarding the license, StarRocks' ELv2 will not affect our contribution to
> the iceberg community under the Apache License V2.
>
> We are also looking forward to receiving help from the iceberg community
> and will be contributing back to the iceberg community.
>
> Thanks,
> Zhao Chun
>
>
> Kyle Bendickson <k...@tabular.io> 于2021年11月8日周一 下午2:53写道:
>
>> +1 around concerns with the Elastic license.
>>
>> Also, more importantly, how important is integration with either of these
>> tools to the Iceberg community and contributors?
>>
>> The Elastic license makes a bit more sense for elasticsearch, as it was
>> an existing project for quite some time. I won’t reiterate the details of
>> that situation, but it’s odd to see a fork of a new, active project using
>> the Elastic license in my opinion.
>>
>> StarRocks admits that they’re at least 40% of code from the Apache Doris
>> project.
>>
>> That said, StarRocks claims to not require other dependencies. It seems
>> StarRocks supports query federation with a few tools so as not to have to
>> import the data and query those systems directly. So I’m not sure what
>> Iceberg support would look like beyond additional query federation. What
>> benefit does this provide?
>>
>> If we determined that integration with one of these tools was something
>> the community valued, could a connector be built to target the Apache Doris
>> project and then StarRocks could fork that code if they liked?
>>
>> - Kyle Bendickson
>> GitHub @kbendick
>>
>>
>>
>> On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote:
>>
>>> +1, I have the same concern for the incompatible license.
>>>
>>> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道:
>>>
>>>> A few additional observations about StarRocks...
>>>>
>>>> - As far as I can tell, StarRocks has an ASF incompatible license
>>>> (Elastic License 2.0).
>>>> - It appears to be a hard fork of Apache Doris, a project still in the
>>>> incubator (and looks like it probably is destructive to the Doris project)
>>>> - The project has only existed for ~2 months.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote:
>>>>
>>>>> Any thoughts for adding StarRocks integration to the roadmap ?
>>>>>
>>>>> I think the guys from StarRocks community can provide more background
>>>>> and inputs.
>>>>>
>>>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote:
>>>>>
>>>>>> Update:
>>>>>>
>>>>>> StarRocks[1] is a next-gen sub-second MPP database for full analysis
>>>>>> scenarios, including multi-dimensional analytics, real-time analytics and
>>>>>> ad-hoc query.  Their team is planning to integrate iceberg tables as
>>>>>> StarRocks external tables in the next month [2], so that people could
>>>>>> connect the data lake and StarRocks warehouse in the same engine.
>>>>>> The excellent performance of StarRocks will also help accelerate the
>>>>>> analysis and access of the iceberg table, I think this is a great thing 
>>>>>> for
>>>>>> both the iceberg community and the StarRocks community.   I think we can
>>>>>> add an extra project about StarRocks integration work in the apache 
>>>>>> iceberg
>>>>>> roadmap [3] ?
>>>>>>
>>>>>> [1].  https://github.com/StarRocks/starrocks
>>>>>> [2].  https://github.com/StarRocks/starrocks/issues/1030
>>>>>> [3].  https://github.com/apache/iceberg/projects
>>>>>>
>>>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>
>>>>>>> I closed the upgrade project and marked the FLIP-27 project priority
>>>>>>> 1. Thanks for all the work to get this done!
>>>>>>>
>>>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Update:
>>>>>>>>
>>>>>>>> I think the project  [Flink: Upgrade to 1.13.2][1] in RoadMap can
>>>>>>>> be closed now, because all of the issues have been addressed.
>>>>>>>>
>>>>>>>> [1]. https://github.com/apache/iceberg/projects/12
>>>>>>>>
>>>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <
>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>
>>>>>>>>> I created a Roadmap section in
>>>>>>>>>  https://github.com/apache/iceberg/pull/3163
>>>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to the
>>>>>>>>> planning boards that Jack created. I figured it makes sense if we link
>>>>>>>>> available Design Docs directly on those Boards (as was already done),
>>>>>>>>> because then the Design docs are closer to the set of related issues.
>>>>>>>>>
>>>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks, Jack!
>>>>>>>>>>
>>>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap page
>>>>>>>>>> as well that links to the projects that Jack just created.
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> It seems like we have reached some consensus around the projects
>>>>>>>>>>> listed here. I have created corresponding Github projects for each:
>>>>>>>>>>> https://github.com/apache/iceberg/projects
>>>>>>>>>>>
>>>>>>>>>>> Related design docs are also linked there.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jack Ye
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <
>>>>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Would it make sense to have a section on the website where we
>>>>>>>>>>>> collect all the links to the design docs/specs as that would be 
>>>>>>>>>>>> easier to
>>>>>>>>>>>> find than searching for things on the ML?
>>>>>>>>>>>>
>>>>>>>>>>>> I was thinking about something like for each component:
>>>>>>>>>>>> * link to the ML discussion
>>>>>>>>>>>> * link to the actual Spec/Design Doc
>>>>>>>>>>>>
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> At the last sync meeting, we brought up publishing a community
>>>>>>>>>>>>> roadmap and brainstormed the many features and initiatives that 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> community is working on. In this thread, I want to make sure that 
>>>>>>>>>>>>> we have a
>>>>>>>>>>>>> good list of what people are thinking about and I think we should 
>>>>>>>>>>>>> try to
>>>>>>>>>>>>> categorize the projects by size and general priority. When we 
>>>>>>>>>>>>> reach a rough
>>>>>>>>>>>>> agreement, I’ll write this up and post it on the ASF site along 
>>>>>>>>>>>>> with links
>>>>>>>>>>>>> to some projects in Github.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My rationale for attempting to prioritize projects is that if
>>>>>>>>>>>>> we try to do too many things, it will be slower progress across 
>>>>>>>>>>>>> everything
>>>>>>>>>>>>> rather than getting a few important items done. I know that 
>>>>>>>>>>>>> priorities
>>>>>>>>>>>>> don’t align very cleanly in practice, but it is hopefully worth 
>>>>>>>>>>>>> trying. To
>>>>>>>>>>>>> come up with a priority, I’m trying to keep top priority items to 
>>>>>>>>>>>>> a minimum
>>>>>>>>>>>>> by including only one from each group (Spark, Flink, Python, 
>>>>>>>>>>>>> etc.). The
>>>>>>>>>>>>> remaining items are split between priority 2 and 3. Priority 3 is 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> urgent, including things that can be plugged in (like other IO 
>>>>>>>>>>>>> libraries),
>>>>>>>>>>>>> docs, etc. Everything else is priority 2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t
>>>>>>>>>>>>> important or progressing, just that it isn’t the current focus. I 
>>>>>>>>>>>>> think of
>>>>>>>>>>>>> it this way: if someone has extra time to review something, what 
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> next? That’s top priority.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here’s my rough categorization. If you disagree, please speak
>>>>>>>>>>>>> up:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - If you think that something should be top priority, what
>>>>>>>>>>>>>    gets moved to priority 2?
>>>>>>>>>>>>>    - Should the priority for a project in 2 or 3 change?
>>>>>>>>>>>>>    - Is the S/M/L size of a project wrong?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Top priority, 1:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - API: Iceberg 1.0 [medium]
>>>>>>>>>>>>>    - Spark: Merge-on-read plans [large]
>>>>>>>>>>>>>    - Maintenance: Delete file compaction [medium]
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Python: Pythonic refactor [medium]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Priority 2:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - ORC: Support delete files stored as ORC [small]
>>>>>>>>>>>>>    - Spark: DSv2 streaming improvements [small]
>>>>>>>>>>>>>    - Flink: Inline file compaction [small]
>>>>>>>>>>>>>    - Flink: Support UPSERT [small]
>>>>>>>>>>>>>    - Views: Spec [medium]
>>>>>>>>>>>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>>>>>>>>>>>    - Spec: Snapshot tagging and branching [small]
>>>>>>>>>>>>>    - Spec: Secondary indexes [large]
>>>>>>>>>>>>>    - Spec v3: Encryption [large]
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Spec v3: Relative paths [large]
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Spec v3: Default field values [medium]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Priority 3:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Docs: versioned docs [medium]
>>>>>>>>>>>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>>>>>>>>>>>    - IO: Support Dell ECS [medium]
>>>>>>>>>>>>>
>>>>>>>>>>>>> External:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Trino: Bucketed joins [small]
>>>>>>>>>>>>>    - Trino: Row-level delete support [medium]
>>>>>>>>>>>>>    - Trino: Merge-on-read plans [medium]
>>>>>>>>>>>>>    - Trino: Multi-catalog support [small]
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Tabular
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>

-- 
Ryan Blue
Tabular

Reply via email to