BTW, I think this proposal sounds similar to FRocksDB, the Flink's custom RocksDB build. Maybe folks maintaining FRocksDB can share some experiences.
CC @Yun Tang Thank you~ Xintong Song On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <tonysong...@gmail.com> wrote: > Hi Godfrey, > > >> 1. Where to put the code? https://github.com/flink-extended is a good >> place. > > > Please notice that `flink-extended` is not endorsed by the Apache Flink > PMC. That means if the proposed new Calcite repository is hosted there, the > maintenance and release will not be guaranteed by the Apache Flink project. > I guess the question is do we consider another 3rd party Calcite repository > more reliable and convenient than the official Apache Calcite that we want > to depend on. > > Thank you~ > > Xintong Song > > > > On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> I'm overall against the idea of creating a fork. >> It implies quite some maintenance overhead, like dealing with unstable >> tests, CI, licensing etc. and the overall release overhead. >> >> Is there no alternative where we can collaborate more with the calcite >> guys, like verifying new features so bugs are caught sooner? >> >> On 22/04/2022 09:31, godfrey he wrote: >> > Dear devs, >> > >> > I would like to open a discussion on the fact that currently many >> > Flink SQL function >> > development relies on Calcite releases, which seriously blocks some >> > Flink SQL's features release. >> > Therefore, I would like to discuss whether it is possible to solve this >> problem >> > by creating Flink's own Calcite repository. >> > >> > Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on >> Calcite-1.30, >> > and we recently want to support fully join-hints functionatity in >> Flink-1.16, >> > which relies on Calcite-1.31 (maybe two or three months later will be >> released). >> > >> > In order to support some new features or fix some bugs, we need to >> upgrade >> > the Calcite version, but every time we upgrade Calcite version >> > (especially upgrades >> > across multiple versions), the processing is very tough: I remember >> clearly that >> > the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to >> complete. >> > >> > Currently, in order to fix some bugs while not upgrading the Calcite >> version, >> > we copy the corresponding Calcite class directly into the Flink project >> > and then modify it accordingly.[2] This approach is rather hacky and >> > hard for code maintenance and upgrades. >> > >> > So, I had an idea whether we could solve this problem by maintaining a >> > Calcite repository >> > in the Flink community. This approach has been practiced within my >> > company for many years. >> > There are similar practices in the industry. For example, Apache Dill >> > also maintains >> > a separate Calcite repository[3]. >> > >> > The following is a brief analysis of the approach and the pros and >> > cons of maintaining a separate repository. >> > >> > Approach: >> > 1. Where to put the code? https://github.com/flink-extended is a good >> place. >> > 2. What extra code can be added to this repository? Only bug fixes and >> features >> > that are already merged into Calcite can be cherry-picked to this >> repository. >> > We also should try to push bug fixes to the Calcite community. >> > Btw, the copied Calcite class in the Flink project can be removed. >> > 3. How to upgrade the Calcite version? Check out the target Calcite >> > release branch >> > and rebase our bug fix code. (As we upgrade, we will maintain fewer >> > and fewer older bug >> > fixes code.) And then, verify all Calcte's tests and Flink's tests in >> > the developer's local >> > environment. If all tests are OK, release the Calcite branch, or fix >> > it in the branch and re-test. >> > After the branch is released, then the version of Calcite in Flink >> > can be upgraded. For example: >> > checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from calcite-1.26.0, >> > move all the copied >> > Calcite code in Flink to the branch, and pick all the hint related >> > changes from Calcite-1.31 to >> > the branch. Then we can change the Calcite version in Flink to >> > calcite-1.26.0-flink-v1-SNAPSHOT, >> > and verify all tests in the locale. Release calcite-1.26.0-flink-v1 >> > after all tests are successful. >> > At last upgrade the calcite version to >> > calcite-1.26.0-flink-v10-flink-v1, and open a PR. >> > 4. Who will maintain it? The maintenance workload is minimal, but the >> > upgrade work is >> > laborious (actually, it's similar to before). I can maintain it in >> > the early stage and standardise the processing. >> > >> > Pros. >> > 1. The release of Flink is decoupled from the release of Calcite, >> > making feature development and bug fix quicker >> > 2. Reduce the hassle of unnecessary calcite upgrades >> > 3. No hacking in Flink to maintain the Calcite copied code >> > >> > cons. >> > 1. Need to maintain an additional Calcite repository >> > 2. The Upgrades are a little more complicated than before >> > >> > Any feedback is very welcome! >> > >> > >> > [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join >> > [2] >> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite >> > [3] https://github.com/apache/drill/blob/master/pom.xml#L64 >> > >> > Best, >> > Godfrey >> >> >>