Hi Jing, My pleasure. I also saw some reviews coming in from Julian on your Calcite PR, so that's great :)
> About keeping up with the Calcite updates, I would like to take this issue. Is it too late to schedule the 1.16 version? How about scheduling this work on version 1.17? I'm not sure if there's a reviewer available, but I still think it would already be super valuable to get a PR opened with the actual work. If we can still fit it in the 1.16 release cycle (there's 4 weeks left until feature freeze) we can push it in, else we'll merge it in for 1.17. Best regards, Martijn Op do 23 jun. 2022 om 06:38 schreef Jing Zhang <beyond1...@gmail.com>: > Hi Martijin, > This is really exciting news. > Thanks a lot for the effort to improve collaboration and communication with > the Calcite community. > > > My take away from the discussion in the Flink community and the > discussion > in the Calcite community is that I believe we should do 3 things. > > Agreed on these 3 points. > About keeping up with the Calcite updates, I would like to take this issue. > Is it too late to schedule the 1.16 version? How about scheduling this work > on version 1.17? > > Best, > Jing Zhang > > > Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 02:01写道: > > > Hi everyone, > > > > I've recently reached out to the Calcite community to see if we could > > somehow get something done with regards to the PR that Jing Zhang had > > opened a long time ago. In that thread, I also mentioned that we had a > > discussion in the Flink community on potentially forking Calcite. I would > > recommend reading up on the thread [1]. Specifically the replies from > other > > projects/PMCs (Apache Drill, Apache Dremio) are super interesting. These > > projects have forked Calcite in the past, regret that move, have reverted > > back to Calcite / are in the process of reverting and are elaborating on > > that. This thread also gained some traction on Twitter in case you're > > interested in more opinions. [3] > > > > My take away from the discussion in the Flink community and the > discussion > > in the Calcite community is that I believe we should do 3 things: > > > > 1. We should not fork Calcite. There might be short term benefits but > long > > term pain. I think we already are suffering from enough long term pain in > > the Flink codebase that we shouldn't take a step that will increase that > > pain even more, scattered over multiple places. > > 2. I think we should try to help out the Calcite community more. Not only > > by opening new PRs for new features, but we can also help by reviewing > > those PRs, reviewing other PRs that could be relevant for Flink or > propose > > improvements given our experience at Flink. As you can see in the Calcite > > thread, Timo has already expressed desire in doing so. Part of the OSS > > community is also about helping each other; if we improve Calcite, we > will > > also improve Flink. > > 3. I think we need to prioritise keeping up with the Calcite updates. > They > > are currently working on releasing version 1.31, while Flink is still at > > 1.26.0. We don't necessarily need to stay in sync with the latest > available > > version, but I definitely think we should be at most 2 versions (and > > preferably 1 version) behind (so currently that would be 1.28 and 1.29 > > soonish). Not only are we increasing our own tech debt by not updating, > we > > are also limiting ourselves in adding new features in the Table/SQL > space. > > As you can also see for the 1.26 release notes, there's a warning to only > > use 1.26 for development since it can corrupt your data [3]. There are > > already multiple upgrade tickets for Calcite [4] [5] [6]. > > > > [1] https://lists.apache.org/thread/3lkfhwjpqwy9pfhnvwmfkwmwlfyqs45z > > [2] > > > > > https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA > > [3] https://calcite.apache.org/news/2020/10/06/release-1.26.0/ > > [4] https://issues.apache.org/jira/browse/FLINK-20873 > > [5] https://issues.apache.org/jira/browse/FLINK-21239 > > [6] https://issues.apache.org/jira/browse/FLINK-27998 > > > > Best regards, > > > > Martijn Visser > > https://twitter.com/MartijnVisser82 > > https://github.com/MartijnVisser > > > > Op do 5 mei 2022 om 10:34 schreef godfrey he <godfre...@gmail.com>: > > > > > Hi, Timo & Martijn, > > > > > > Sorry for the late reply, thanks for the feedback. > > > > > > I strongly agree that the best solution would be to cooperate more > > > with the Calcite community > > > and maintain all new features and bug fixes in the Calcite community, > > > without any forking. > > > It is a long-term process. I think it's difficult to change community > > > rules, because the Calcite > > > project is a neutral lib that serves multiple projects simultaneously. > > > I don't think fork calcite is the perfect solution, but rather a > > > better balance within limited resources: > > > it's possible to introduce some necessary minor features and bug fixes > > > without having to > > > upgrade to the latest version. > > > > > > > > > I investigate other projects that use Calcite[1] and find that most of > > > them do not use > > > the latest version of the Calcite. Even for the Kylin community, the > > > version, based on > > > Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark > and > > > drill) > > > My guess is that these projects choosed a stable version, > > > (or even choose to maintain a fork project), to maintain the stability. > > > When Flink does not need to introduce new syntax anymore, > > > I guess it's less expensive and more manageable to maintain a fork > > Calcite. > > > > > > > > > Even if we don't end up going the fork calcite route, > > > I hope that we could discuss the options for subsequent calcite > upgrades > > > here. > > > Just like Timo mentioned, how to balance feature development and code > > > maintenance. > > > There are a few realistic questions about the Calcite upgrade > > > situation now, such as: > > > 1. If we keep up with the latest version of Calcite, who is > > > responsible for each upgrade? > > > The current status is that no one has motivation to upgrade the version > > > unless he/she wants to drive new features. > > > 2. Do we have the resources/energy to upgrade each version? > > > 3. How do we ensure that each upgrade is expected? It took a lot of > > effort > > > to > > > verify the correctness of the upgrade results.The Test set for > > > uncommon sql usage is not enough now. > > > > > > > > > > I still don't quite understand why we want to avoid Calcite upgrades. > > > Not every feature in Calcite is a feature we really need. While some > > > refactorings can be very burdensome > > > (lots of bugs, plan changes, and a lot of effort to fix). > > > Just as mentioned above, the "SEARCH operator" refactoring in > > > CALCITE-4173 did cause a lot of bugs. > > > > > > > > > [1] https://calcite.apache.org/docs/powered_by.html > > > [2] > https://github.com/Kyligence/calcite/commits/kycalcite-1.16.0.x-4.x > > > > > > Best, > > > Godfrey > > > > > > Martijn Visser <martijnvis...@apache.org> 于2022年4月25日周一 22:11写道: > > > > > > > > > > > Hi all, > > > > > > > > Just a couple of remarks on some of things from this thread: > > > > > > > > > I think we will upgrade Calcite to 1.31 only when Flink depends on > > some > > > > significant features of Calcite. > > > > > Such as: new syntax PTF (CALCITE-4865). > > > > > > > > Like Timo also mentions, I think this is a bad practice. Calcite is a > > key > > > > dependency for Flink. We should upgrade as often as possible, not as > > > little > > > > as possible. Any fork in the beginning is easy, but it becomes a > bigger > > > > pain as time progresses. > > > > > > > > > >## Are the calcite repository costly to maintain? > > > > > From the experience of @Dann y chen (One PMC of Calcite), > publishing > > > > > is much easier. > > > > > > > > Since Calcite is such a key dependency, I would really oppose forking > > it. > > > > There will only be very few maintainers of such a fork. The amount of > > > > people that know and can maintain both Calcite and Flink will be even > > > less. > > > > > > > > > I'm just trying to find an approach which can avoid frequent > Calcite > > > > upgrades, > > > > > but easily support bug fix and minor new feature development. > > > > > > > > I still don't quite understand why we want to avoid Calcite upgrades. > > > > Upgrading Calcite introduces new features, but it also resolves bugs > > that > > > > currently exist in Flink. Part of housekeeping is that we keep our > > > codebase > > > > up-to-date and tidy, to avoid that it becomes a mess and > > unmaintainable. > > > I > > > > understand that this is less preferred, because you can't spend this > > time > > > > working on new features. If I make a comparison with doing > construction > > > > work on your house, you can't put in a new floor if you don't clean > out > > > the > > > > room first. > > > > > > > > > About Calcite version upgrading, we should try not use the latest > > > > Calcite version to avoid the bugs introduced by the new version if > > > possible. > > > > > > > > I can fully agree on that. But right now we're running multiple > > versions > > > > behind. > > > > > > > > Have we reached out to the Calcite community first with our problems, > > or > > > > have we gone straight into "let's fork it"? > > > > > > > > I still haven't seen an argument that would make me in favor of > setting > > > up > > > > a fork. > > > > > > > > Best regards, > > > > > > > > Martijn > > > > > > > > On Mon, 25 Apr 2022 at 15:55, Timo Walther <twal...@apache.org> > wrote: > > > > > > > > > Hi Godfrey, > > > > > > > > > > I'm also strictly against maintaining a Calcite fork. We had > similar > > > > > discussions during the merge of the Blink code base in the past and > > I'm > > > > > happy that we could prevent a fork until today. Let me elaborate a > > bit > > > > > on my strict opinion here: > > > > > > > > > > 1) Calcite does not offer bugfix releases > > > > > > > > > > In the end, also Calcite is an Apache community. I'm sure we could > > > > > improve our collaboration and help releasing bugfix releases. So > far > > we > > > > > were mostly leveraging all the stuff that the Calcite community has > > > > > built. It would be good to strengthen the relation and also give > > > > > something back. > > > > > > > > > > So far having no bugfix releases was not really a problem for the > > Flink > > > > > community. We simply copy over files from Calcite into Flink once a > > bug > > > > > has been merged in Calcite. Maven implicitly overwrites the > original > > > > > Calcite classes during artifact building. Most `org.apache.calcite` > > > > > classes in the Flink code base are fixing bugs and wait for removal > > > > > during the next Calcite upgrade. > > > > > > > > > > 2) Slow feature reviewing > > > > > > > > > > Slow feature reviewing has a good and a bad side. One of the > reasons > > > why > > > > > it is so slow is because the maintainers pay a lot of attention to > > > > > standard compliance, long-term code quality, and > > > > > cross-downstream-projects usability. All of that is the reason why > > the > > > > > Calcite code base has last multiple decades already and is useful > for > > > > > many parties. > > > > > > > > > > Relying on Calcite has protected the Flink code base from merging > > > > > non-standard SQL features and extending the SQL dialect too much. > The > > > 1. > > > > > windows in Calcite and aux functions such as TUMBLE_START have > shown > > > > > that only standard compliant features should be merged. Now the > Flink > > > > > community has the problem of maintaining this custom syntax. > > > > > > > > > > 3) No compatibility guaranteed from the Calcite community > > > > > > > > > > I disagree here. Many changes are protected by keeping deprecated > > > > > methods/constructors/classes around for years. And many refactoring > > are > > > > > nice also for the Flink community. E.g. easier optimizer rule > > > definition. > > > > > > > > > > IMHO the core problem is rather that we don't update Calcite > > frequently > > > > > enough. Currently, we are lagging behind quite a bit because we > don't > > > > > pay enough resources in code maintenance but only in new feature > > > > > development. We should spend some time in a better balance of the > > two. > > > > > > > > > > Regards, > > > > > Timo > > > > > > > > > > Am 25.04.22 um 15:13 schrieb godfrey he: > > > > > > Hi Jark, > > > > > > > > > > > > Agree with you, thanks for the feedback. > > > > > > > > > > > > Best, > > > > > > Godfrey > > > > > > > > > > > > Jark Wu <imj...@gmail.com> 于2022年4月25日周一 13:02写道: > > > > > >> Thanks, Godfrey, for starting this discussion, > > > > > >> > > > > > >> I understand the motivation behind it. > > > > > >> No bugfix releases, slow feature reviewing, and no compatibility > > > > > guaranteed > > > > > >> are genuinely blocking the development of Flink SQL. > > > > > >> > > > > > >> I think a fork is the last choice before trying our best to > > > cooperate > > > > > with > > > > > >> the Calcite community. > > > > > >> But we shouldn't stop here if there is no progress. Therefore, > I'm > > > okay > > > > > >> with maintaining a fork. > > > > > >> > > > > > >> However: > > > > > >> 1) It should be a temporary solution. We should have a plan to > > move > > > > > back to > > > > > >> the latest Calcite version at some point (e.g., pushing them to > > > resolve > > > > > the > > > > > >> problems mentioned above). > > > > > >> > > > > > >> 2) If we maintain the fork in flink-extended, we should > determine > > a > > > > > groupId > > > > > >> for deploying to maven central. The community should have > > > permission to > > > > > >> deploy under the groupId. > > > > > >> > > > > > >> Best, > > > > > >> Jark > > > > > >> > > > > > >> > > > > > >> On Sun, 24 Apr 2022 at 16:14, godfrey he <godfre...@gmail.com> > > > wrote: > > > > > >> > > > > > >>> Hi, Jing > > > > > >>> Thanks for sharing the Calcite experiences. > > > > > >>> About Calcite version upgrading, we should try not use the > > latest > > > > > Calcite > > > > > >>> version to avoid the bugs introduced by the new version if > > > possible. > > > > > >>> This may be a best practice. > > > > > >>> > > > > > >>> > > > > > >>> Hi, Yun > > > > > >>> Thanks for the detailed explanation for the experiences > regarding > > > > > FRocksDB. > > > > > >>> I agree with you that the situation with Calcite and RocksDB > is a > > > > > >>> little difference. > > > > > >>> The main pain point for Calcite is that we have to upgrade > > Calcite > > > to > > > > > >>> latest version > > > > > >>> to get fix bugs and new features, but the latest version may be > > > > > >>> unstable, which is a pain for us. > > > > > >>> If we all agree we should maintain a forked Calcite repo, > > > > > >>> there are many experiences we can learn from FRocksDB. > > > > > >>> > > > > > >>> Best, > > > > > >>> Godfrey > > > > > >>> > > > > > >>> Yun Tang <myas...@live.com> 于2022年4月24日周日 11:58写道: > > > > > >>>> Hi all, > > > > > >>>> > > > > > >>>> I could share two cents here for how we maintain FRocksDB. > > > > > >>>> > > > > > >>>> First of all, we also do not prefer to maintain a customized > > > RocksDB > > > > > >>> version in Flink, which brings additional overhead for Flink > > > community: > > > > > >>>> > > > > > >>>> 1. RocksDB community switches to circleci for the CI tests > > > after > > > > > >>> RocksDB-6.x, which requires additional money to run all tests > for > > > > > reviewing > > > > > >>> each PR. > > > > > >>>> 2. We need to compile and include all kinds of FRocksDB > > > binaries > > > > > on > > > > > >>> linux32/64, windows, ppc64, ARM and Macos platforms, which is > > > really > > > > > tough > > > > > >>> and boring experiences. > > > > > >>>> The root reason why we have to maintain a forked RocksDB repo > is > > > that > > > > > >>> RocksDB community refuses to accept a plugin-like feature based > > on > > > > > >>> compaction filter, which is heavily dependent by Flink's state > > TTL > > > > > feature > > > > > >>> [1]. From RocksDB-7.0, the community also moves several > > components > > > to > > > > > the > > > > > >>> plugin repo [2], although this cannot avoid us to release all > > > kinds of > > > > > >>> binaries, it can at least decrease our energy to maintain the > > whole > > > > > tests > > > > > >>> if we follow this trend. > > > > > >>>> Last but not least, I don't think current discussion on Apache > > > Calcite > > > > > >>> is in the same situation as FRocksDB. Current Flink SQL guys > > > complain > > > > > that > > > > > >>> Calcite is released too slowly, which blocks some feature > > > development > > > > > in > > > > > >>> Flink. However, RocksDB community itself actually release new > > > versions > > > > > more > > > > > >>> frequently, and we don't rely on its new version for some new > > > features > > > > > >>> currently. Moreover, we're often more careful on upgrading > > > underlying > > > > > >>> storage component as it could impact the performance and data > > > > > correctness. > > > > > >>>> > > > > > >>>> [1] > > > > > >>> > > > > > > > > > > > https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786 > > > > > >>>> [2] https://github.com/facebook/rocksdb/issues/9390 > > > > > >>>> > > > > > >>>> Best > > > > > >>>> Yun Tang > > > > > >>>> > > > > > >>>> ________________________________ > > > > > >>>> From: Jing Zhang <beyond1...@gmail.com> > > > > > >>>> Sent: Saturday, April 23, 2022 15:21 > > > > > >>>> To: dev <dev@flink.apache.org> > > > > > >>>> Cc: Yun Tang <myas...@live.com> > > > > > >>>> Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink > > to > > > > > >>> accelerate the development for Flink SQL features > > > > > >>>> Hi Godfrey, > > > > > >>>> I would like to share some problems based on my past > experience. > > > > > >>>> 1. It's not easy to push new features in the CALCITE > community. > > > > > >>>> As @Martijn referred, > > > > > https://issues.apache.org/jira/browse/CALCITE-4865 > > > > > >>> / > > > > > >>>> https://github.com/apache/calcite/pull/2606 is such an > example. > > > > > >>>> I tried out many ways, for example, sent review requests in > the > > > dev > > > > > mail > > > > > >>> list, left comments in JIRA and in pull requests. > > > > > >>>> And had to give up finally. Sorry for that. > > > > > >>>> 2. However, some new features of calcite are radical. > > > > > >>>> Such as https://issues.apache.org/jira/browse/CALCITE-4173, > > > which had > > > > > >>> some strong opposition in the CALCITE community, > > > > > >>>> But it was merged finally and caused unexpected problems, > such > > as > > > > > wrong > > > > > >>> results (https://issues.apache.org/jira/browse/FLINK-24708) > > > > > >>>> and other related bugs. > > > > > >>>> 3. Every time we upgrade the calcite version, we will cross > > > multiple > > > > > >>> versions, resulting in a slow upgrade process and > > > > > >>>> uncontrolled results, often causing some unexpected problems. > > > > > >>>> > > > > > >>>> Thank @Godfrey for driving this discussion in a big scope. > > > > > >>>> I think it's a good chance to review these problems and find a > > > > > solution. > > > > > >>>> > > > > > >>>> Best, > > > > > >>>> Jing Zhang > > > > > >>>> > > > > > >>>> godfrey he <godfre...@gmail.com<mailto:godfre...@gmail.com>> > > > > > >>> 于2022年4月22日周五 21:40写道: > > > > > >>>> Hi Chesnay, > > > > > >>>> > > > > > >>>> There is no bug fix version until now. > > > > > >>>> You can find the releases in > > > https://github.com/apache/calcite/tags > > > > > >>>> > > > > > >>>> Best, > > > > > >>>> Godfrey > > > > > >>>> > > > > > >>>> Chesnay Schepler <ches...@apache.org<mailto: > ches...@apache.org > > >> > > > > > >>> 于2022年4月22日周五 18:48写道: > > > > > >>>>> I find it a bit weird that the supposed only way to get a bug > > > fix is > > > > > to > > > > > >>>>> do a big version upgrade. > > > > > >>>>> Is Calcite not creating bugfix releases? > > > > > >>>>> > > > > > >>>>> On 22/04/2022 12:26, godfrey he wrote: > > > > > >>>>>> Thanks for the feedback, guys! > > > > > >>>>>> > > > > > >>>>>> For Jingsong's feedback: > > > > > >>>>>>> ## Do we have the plan to upgrade calcite to 1.31? > > > > > >>>>>> I think we will upgrade Calcite to 1.31 only when Flink > > depends > > > on > > > > > >>>>>> some significant features of Calcite. > > > > > >>>>>> Such as: new syntax PTF (CALCITE-4865). > > > > > >>>>>> > > > > > >>>>>> >## Is Cherry-pick costly? > > > > > >>>>>> >From the experience of maintaining calcite with our > company, > > > the > > > > > >>> cost is small. > > > > > >>>>>> We only cherry-pick the bug fixes and needed minor features. > > > > > >>>>>> For a major feature, we can choose to upgrade the version. > > > > > >>>>>> > > > > > >>>>>>> ## Are the calcite repository costly to maintain? > > > > > >>>>>> >From the experience of @Dann y chen (One PMC of Calcite), > > > > > publishing > > > > > >>>>>> is much easier. > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>> For Chesnay's feedback: > > > > > >>>>>> I also totally agree that a fork repository will increase > the > > > cost > > > > > of > > > > > >>>>>> maintenance. > > > > > >>>>>> > > > > > >>>>>> Usually, the Calcite community releases a version three > months > > > or > > > > > >>> more. > > > > > >>>>>> I think it's hard to let Calcite change the release cycle > > > > > >>>>>> because Calcite supports many compute engines. > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>> For Konstantin's feedback: > > > > > >>>>>> Some changes in Calcite may cause hundreds of plan changes > in > > > Flink, > > > > > >>>>>> such as: CALCITE-4173. > > > > > >>>>>> We must check whether the change is expected, whether there > is > > > > > >>>>>> performance regression. > > > > > >>>>>> Some of the changes are very subtle, especially in the CBO > > > planner. > > > > > >>>>>> This situation also occurs similarly within upgrading from > > 1.1x > > > to > > > > > >>> 1.22. > > > > > >>>>>> If you are not familiar with Flink planner and Calcite, it > > will > > > be > > > > > >>>>>> more difficult to upgrade. > > > > > >>>>>> > > > > > >>>>>> > > > > > >>>>>> For Xintong's feedback: > > > > > >>>>>> You are right, I will connect Yun for some help, Thanks for > > the > > > > > >>> suggestions. > > > > > >>>>>> > > > > > >>>>>> For Martijn's feedback: > > > > > >>>>>> I'm also against cherry-pick many features code into the > fock > > > > > >>> repository, > > > > > >>>>>> and I also totally agree we should collaborate closely with > > the > > > > > >>>>>> Calcite community. > > > > > >>>>>> I'm just trying to find an approach which can avoid frequent > > > Calcite > > > > > >>> upgrades, > > > > > >>>>>> but easily support bug fix and minor new feature > development. > > > > > >>>>>> > > > > > >>>>>> As for the CALCITE-4865 case, I think we should upgrade the > > > Calcite > > > > > >>>>>> version to support PTF. > > > > > >>>>>> > > > > > >>>>>> @Jing zhang, can you share some 'feeling' for CALCITE-4865 ? > > > > > >>>>>> > > > > > >>>>>> Best, > > > > > >>>>>> Godfrey > > > > > >>>>>> > > > > > >>>>>> Martijn Visser <martijnvis...@apache.org<mailto: > > > > > >>> martijnvis...@apache.org>> 于2022年4月22日周五 17:31写道: > > > > > >>>>>>> Hi everyone, > > > > > >>>>>>> > > > > > >>>>>>> Overall I'm against the idea of setting up a Calcite fork > for > > > the > > > > > >>> same > > > > > >>>>>>> reasons that Chesnay has mentioned. We've talked > extensively > > > about > > > > > >>> doing an > > > > > >>>>>>> upgrade of Calcite during the Flink 1.15 release period, > but > > > there > > > > > >>> was a > > > > > >>>>>>> lot of pushback by the maintainers against that because of > > the > > > > > >>> required > > > > > >>>>>>> efforts. Having our own fork will mean that there will be > > even > > > more > > > > > >>> effort > > > > > >>>>>>> required, because not only do we need to perform the > upgrade > > on > > > > > >>> Flink's > > > > > >>>>>>> end, we also need to maintain this Calcite fork. > > > > > >>>>>>> > > > > > >>>>>>> I think what we should do is have a closer collaboration > with > > > the > > > > > >>> Calcite > > > > > >>>>>>> community and see if we can also help out with > > > reviewing/merging > > > > > >>> PRs and > > > > > >>>>>>> more frequent releases. What we're seeing is that already > > > features > > > > > >>> that are > > > > > >>>>>>> proposed towards Calcite because we need them for Flink, > are > > > not > > > > > >>> getting > > > > > >>>>>>> picked up by the Calcite community. See > > > > > >>>>>>> https://issues.apache.org/jira/browse/CALCITE-4865 / > > > > > >>>>>>> https://github.com/apache/calcite/pull/2606 which is such > an > > > > > >>> example. > > > > > >>>>>>> I would rather invest more in collaborating with the > Calcite > > > > > >>> community > > > > > >>>>>>> instead of maintaining our own fork. I believe that would > > help > > > us > > > > > >>> get new > > > > > >>>>>>> features and bug fixes sooner. > > > > > >>>>>>> > > > > > >>>>>>> Best regards, > > > > > >>>>>>> > > > > > >>>>>>> Martijn Visser > > > > > >>>>>>> https://twitter.com/MartijnVisser82 > > > > > >>>>>>> https://github.com/MartijnVisser > > > > > >>>>>>> > > > > > >>>>>>> > > > > > >>>>>>> On Fri, 22 Apr 2022 at 10:46, Xintong Song < > > > tonysong...@gmail.com > > > > > >>> <mailto:tonysong...@gmail.com>> wrote: > > > > > >>>>>>>> BTW, I think this proposal sounds similar to FRocksDB, the > > > Flink's > > > > > >>> custom > > > > > >>>>>>>> RocksDB build. Maybe folks maintaining FRocksDB can share > > some > > > > > >>> experiences. > > > > > >>>>>>>> CC @Yun Tang > > > > > >>>>>>>> > > > > > >>>>>>>> Thank you~ > > > > > >>>>>>>> > > > > > >>>>>>>> Xintong Song > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> On Fri, Apr 22, 2022 at 4:35 PM Xintong Song < > > > > > >>> tonysong...@gmail.com<mailto:tonysong...@gmail.com>> > > > > > >>>>>>>> wrote: > > > > > >>>>>>>> > > > > > >>>>>>>>> Hi Godfrey, > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>>> 1. Where to put the code? > > https://github.com/flink-extended > > > is > > > > > >>> a good > > > > > >>>>>>>>>> place. > > > > > >>>>>>>>> Please notice that `flink-extended` is not endorsed by > the > > > Apache > > > > > >>> Flink > > > > > >>>>>>>>> PMC. That means if the proposed new Calcite repository is > > > hosted > > > > > >>> there, > > > > > >>>>>>>> the > > > > > >>>>>>>>> maintenance and release will not be guaranteed by the > > Apache > > > > > Flink > > > > > >>>>>>>> project. > > > > > >>>>>>>>> I guess the question is do we consider another 3rd party > > > Calcite > > > > > >>>>>>>> repository > > > > > >>>>>>>>> more reliable and convenient than the official Apache > > Calcite > > > > > >>> that we > > > > > >>>>>>>> want > > > > > >>>>>>>>> to depend on. > > > > > >>>>>>>>> > > > > > >>>>>>>>> Thank you~ > > > > > >>>>>>>>> > > > > > >>>>>>>>> Xintong Song > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler < > > > > > >>> ches...@apache.org<mailto:ches...@apache.org>> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>> > > > > > >>>>>>>>>> I'm overall against the idea of creating a fork. > > > > > >>>>>>>>>> It implies quite some maintenance overhead, like dealing > > > with > > > > > >>> unstable > > > > > >>>>>>>>>> tests, CI, licensing etc. and the overall release > > overhead. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> Is there no alternative where we can collaborate more > with > > > the > > > > > >>> calcite > > > > > >>>>>>>>>> guys, like verifying new features so bugs are caught > > sooner? > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> On 22/04/2022 09:31, godfrey he wrote: > > > > > >>>>>>>>>>> Dear devs, > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> I would like to open a discussion on the fact that > > > currently > > > > > >>> many > > > > > >>>>>>>>>>> Flink SQL function > > > > > >>>>>>>>>>> development relies on Calcite releases, which > > seriously > > > > > >>> blocks some > > > > > >>>>>>>>>>> Flink SQL's features release. > > > > > >>>>>>>>>>> Therefore, I would like to discuss whether it is > possible > > > to > > > > > >>> solve > > > > > >>>>>>>> this > > > > > >>>>>>>>>> problem > > > > > >>>>>>>>>>> by creating Flink's own Calcite repository. > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Currently, Flink depends on Caclite-1.26, FLIP-204[1] > > > relies on > > > > > >>>>>>>>>> Calcite-1.30, > > > > > >>>>>>>>>>> and we recently want to support fully join-hints > > > functionatity > > > > > >>> in > > > > > >>>>>>>>>> Flink-1.16, > > > > > >>>>>>>>>>> which relies on Calcite-1.31 (maybe two or three months > > > later > > > > > >>> will be > > > > > >>>>>>>>>> released). > > > > > >>>>>>>>>>> In order to support some new features or fix some bugs, > > we > > > need > > > > > >>> to > > > > > >>>>>>>>>> upgrade > > > > > >>>>>>>>>>> the Calcite version, but every time we upgrade Calcite > > > version > > > > > >>>>>>>>>>> (especially upgrades > > > > > >>>>>>>>>>> across multiple versions), the processing is very > tough: > > I > > > > > >>> remember > > > > > >>>>>>>>>> clearly that > > > > > >>>>>>>>>>> the Calcite upgrade from 1.22 to 1.26 took two > weeks > > of > > > > > >>> full-time to > > > > > >>>>>>>>>> complete. > > > > > >>>>>>>>>>> Currently, in order to fix some bugs while not > upgrading > > > the > > > > > >>> Calcite > > > > > >>>>>>>>>> version, > > > > > >>>>>>>>>>> we copy the corresponding Calcite class directly into > the > > > Flink > > > > > >>>>>>>> project > > > > > >>>>>>>>>>> and then modify it accordingly.[2] This approach is > > rather > > > > > >>> hacky and > > > > > >>>>>>>>>>> hard for code maintenance and upgrades. > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> So, I had an idea whether we could solve this problem > by > > > > > >>> maintaining a > > > > > >>>>>>>>>>> Calcite repository > > > > > >>>>>>>>>>> in the Flink community. This approach has been > practiced > > > within > > > > > >>> my > > > > > >>>>>>>>>>> company for many years. > > > > > >>>>>>>>>>> There are similar practices in the industry. For > > > example, > > > > > >>> Apache > > > > > >>>>>>>> Dill > > > > > >>>>>>>>>>> also maintains > > > > > >>>>>>>>>>> a separate Calcite repository[3]. > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> The following is a brief analysis of the approach and > the > > > pros > > > > > >>> and > > > > > >>>>>>>>>>> cons of maintaining a separate repository. > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Approach: > > > > > >>>>>>>>>>> 1. Where to put the code? > > > https://github.com/flink-extended is > > > > > >>> a good > > > > > >>>>>>>>>> place. > > > > > >>>>>>>>>>> 2. What extra code can be added to this repository? > Only > > > bug > > > > > >>> fixes and > > > > > >>>>>>>>>> features > > > > > >>>>>>>>>>> that are already merged into Calcite can be > cherry-picked > > > to > > > > > >>> this > > > > > >>>>>>>>>> repository. > > > > > >>>>>>>>>>> We also should try to push bug fixes to the Calcite > > > community. > > > > > >>>>>>>>>>> Btw, the copied Calcite class in the Flink project can > be > > > > > >>> removed. > > > > > >>>>>>>>>>> 3. How to upgrade the Calcite version? Check out the > > target > > > > > >>> Calcite > > > > > >>>>>>>>>>> release branch > > > > > >>>>>>>>>>> and rebase our bug fix code. (As we upgrade, we will > > > maintain > > > > > >>> fewer > > > > > >>>>>>>>>>> and fewer older bug > > > > > >>>>>>>>>>> fixes code.) And then, verify all Calcte's tests and > > > Flink's > > > > > >>> tests in > > > > > >>>>>>>>>>> the developer's local > > > > > >>>>>>>>>>> environment. If all tests are OK, release the > Calcite > > > > > >>> branch, or fix > > > > > >>>>>>>>>>> it in the branch and re-test. > > > > > >>>>>>>>>>> After the branch is released, then the version of > > > Calcite > > > > > in > > > > > >>> Flink > > > > > >>>>>>>>>>> can be upgraded. For example: > > > > > >>>>>>>>>>> checkout calcite-1.26.0-flink-v1-SNAPSHOT branch > from > > > > > >>>>>>>> calcite-1.26.0, > > > > > >>>>>>>>>>> move all the copied > > > > > >>>>>>>>>>> Calcite code in Flink to the branch, and pick all > the > > > hint > > > > > >>> related > > > > > >>>>>>>>>>> changes from Calcite-1.31 to > > > > > >>>>>>>>>>> the branch. Then we can change the Calcite version > in > > > Flink > > > > > >>> to > > > > > >>>>>>>>>>> calcite-1.26.0-flink-v1-SNAPSHOT, > > > > > >>>>>>>>>>> and verify all tests in the locale. Release > > > > > >>> calcite-1.26.0-flink-v1 > > > > > >>>>>>>>>>> after all tests are successful. > > > > > >>>>>>>>>>> At last upgrade the calcite version to > > > > > >>>>>>>>>>> calcite-1.26.0-flink-v10-flink-v1, and open a PR. > > > > > >>>>>>>>>>> 4. Who will maintain it? The maintenance workload is > > > minimal, > > > > > >>> but the > > > > > >>>>>>>>>>> upgrade work is > > > > > >>>>>>>>>>> laborious (actually, it's similar to before). I can > > > > > maintain > > > > > >>> it in > > > > > >>>>>>>>>>> the early stage and standardise the processing. > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Pros. > > > > > >>>>>>>>>>> 1. The release of Flink is decoupled from the release > of > > > > > >>> Calcite, > > > > > >>>>>>>>>>> making feature development and bug fix quicker > > > > > >>>>>>>>>>> 2. Reduce the hassle of unnecessary calcite upgrades > > > > > >>>>>>>>>>> 3. No hacking in Flink to maintain the Calcite copied > > code > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> cons. > > > > > >>>>>>>>>>> 1. Need to maintain an additional Calcite repository > > > > > >>>>>>>>>>> 2. The Upgrades are a little more complicated than > before > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Any feedback is very welcome! > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> [1] > > > > > >>> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join > > > > > >>>>>>>>>>> [2] > > > > > >>> > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite > > > > > >>>>>>>>>>> [3] > > > https://github.com/apache/drill/blob/master/pom.xml#L64 > > > > > >>>>>>>>>>> > > > > > >>>>>>>>>>> Best, > > > > > >>>>>>>>>>> Godfrey > > > > > >>>>>>>>>> > > > > > > > > > > > > > > > >