BTW, I think this proposal sounds similar to FRocksDB, the Flink's custom
RocksDB build. Maybe folks maintaining FRocksDB can share some experiences.

CC @Yun Tang

Thank you~

Xintong Song



On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <tonysong...@gmail.com> wrote:

> Hi Godfrey,
>
>
>> 1. Where to put the code? https://github.com/flink-extended is a good
>> place.
>
>
> Please notice that `flink-extended` is not endorsed by the Apache Flink
> PMC. That means if the proposed new Calcite repository is hosted there, the
> maintenance and release will not be guaranteed by the Apache Flink project.
> I guess the question is do we consider another 3rd party Calcite repository
> more reliable and convenient than the official Apache Calcite that we want
> to depend on.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> I'm overall against the idea of creating a fork.
>> It implies quite some maintenance overhead, like dealing with unstable
>> tests, CI, licensing etc. and the overall release overhead.
>>
>> Is there no alternative where we can collaborate more with the calcite
>> guys, like verifying new features so bugs are caught sooner?
>>
>> On 22/04/2022 09:31, godfrey he wrote:
>> > Dear devs,
>> >
>> > I would like to open a discussion on the fact that currently many
>> > Flink SQL function
>> >   development relies on Calcite releases, which seriously blocks some
>> > Flink SQL's features release.
>> > Therefore, I would like to discuss whether it is possible to solve this
>> problem
>> > by creating Flink's own Calcite repository.
>> >
>> > Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on
>> Calcite-1.30,
>> > and we recently want to support fully join-hints functionatity in
>> Flink-1.16,
>> > which relies on Calcite-1.31 (maybe two or three months later will be
>> released).
>> >
>> > In order to support some new features or fix some bugs, we need to
>> upgrade
>> > the Calcite version, but every time we upgrade Calcite version
>> > (especially upgrades
>> > across multiple versions), the processing is very tough: I remember
>> clearly that
>> >   the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to
>> complete.
>> >
>> > Currently, in order to fix some bugs while not upgrading the Calcite
>> version,
>> > we copy the corresponding Calcite class directly into the Flink project
>> > and then modify it accordingly.[2] This approach is rather hacky and
>> > hard for code maintenance and upgrades.
>> >
>> > So, I had an idea whether we could solve this problem by maintaining a
>> > Calcite repository
>> > in the Flink community. This approach has been practiced within my
>> > company for many years.
>> >   There are similar practices in the industry. For example, Apache Dill
>> > also maintains
>> > a separate Calcite repository[3].
>> >
>> > The following is a brief analysis of the approach and the pros and
>> > cons of maintaining a separate repository.
>> >
>> > Approach:
>> > 1. Where to put the code? https://github.com/flink-extended is a good
>> place.
>> > 2. What extra code can be added to this repository? Only bug fixes and
>> features
>> > that are already merged into Calcite can be cherry-picked to this
>> repository.
>> > We also should try to push bug fixes to the Calcite community.
>> > Btw, the copied Calcite class in the Flink project can be removed.
>> > 3. How to upgrade the Calcite version? Check out the target Calcite
>> > release branch
>> > and rebase our bug fix code. (As we upgrade, we will maintain fewer
>> > and fewer older bug
>> > fixes code.) And then, verify all Calcte's tests and Flink's tests in
>> > the developer's local
>> >   environment. If all tests are OK, release the Calcite branch, or fix
>> > it in the branch and re-test.
>> >   After the branch is released, then the version of Calcite in Flink
>> > can be upgraded. For example:
>> >   checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from calcite-1.26.0,
>> > move all the copied
>> >   Calcite code in Flink to the branch, and pick all the hint related
>> > changes from Calcite-1.31 to
>> >   the branch. Then we can change the Calcite version in Flink to
>> > calcite-1.26.0-flink-v1-SNAPSHOT,
>> > and verify all tests in the locale. Release calcite-1.26.0-flink-v1
>> > after all tests are successful.
>> > At last upgrade the calcite version to
>> > calcite-1.26.0-flink-v10-flink-v1, and open a PR.
>> > 4. Who will maintain it? The maintenance workload is minimal, but the
>> > upgrade work is
>> >   laborious (actually, it's similar to before). I can maintain it in
>> > the early stage and standardise the processing.
>> >
>> > Pros.
>> > 1. The release of Flink is decoupled from the release of Calcite,
>> >   making feature development and bug fix quicker
>> > 2. Reduce the hassle of unnecessary calcite upgrades
>> > 3. No hacking in Flink to maintain the Calcite copied code
>> >
>> > cons.
>> > 1. Need to maintain an additional Calcite repository
>> > 2. The Upgrades are a little more complicated than before
>> >
>> > Any feedback is very welcome!
>> >
>> >
>> > [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
>> > [2]
>> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
>> > [3] https://github.com/apache/drill/blob/master/pom.xml#L64
>> >
>> > Best,
>> > Godfrey
>>
>>
>>

Reply via email to