Hi Jing,

My pleasure. I also saw some reviews coming in from Julian on your Calcite
PR, so that's great :)

> About keeping up with the Calcite updates, I would like to take this
issue. Is it too late to schedule the 1.16 version? How about scheduling
this work on version 1.17?

I'm not sure if there's a reviewer available, but I still think it would
already be super valuable to get a PR opened with the actual work. If we
can still fit it in the 1.16 release cycle (there's 4 weeks left until
feature freeze) we can push it in, else we'll merge it in for 1.17.

Best regards,

Martijn

Op do 23 jun. 2022 om 06:38 schreef Jing Zhang <beyond1...@gmail.com>:

> Hi Martijin,
> This is really exciting news.
> Thanks a lot for the effort to improve collaboration and communication with
> the Calcite community.
>
> > My take away from the discussion in the Flink community and the
> discussion
> in the Calcite community is that I believe we should do 3 things.
>
> Agreed on these 3 points.
> About keeping up with the Calcite updates, I would like to take this issue.
> Is it too late to schedule the 1.16 version? How about scheduling this work
> on version 1.17?
>
> Best,
> Jing Zhang
>
>
> Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 02:01写道:
>
> > Hi everyone,
> >
> > I've recently reached out to the Calcite community to see if we could
> > somehow get something done with regards to the PR that Jing Zhang had
> > opened a long time ago. In that thread, I also mentioned that we had a
> > discussion in the Flink community on potentially forking Calcite. I would
> > recommend reading up on the thread [1]. Specifically the replies from
> other
> > projects/PMCs (Apache Drill, Apache Dremio) are super interesting. These
> > projects have forked Calcite in the past, regret that move, have reverted
> > back to Calcite / are in the process of reverting and are elaborating on
> > that. This thread also gained some traction on Twitter in case you're
> > interested in more opinions. [3]
> >
> > My take away from the discussion in the Flink community and the
> discussion
> > in the Calcite community is that I believe we should do 3 things:
> >
> > 1. We should not fork Calcite. There might be short term benefits but
> long
> > term pain. I think we already are suffering from enough long term pain in
> > the Flink codebase that we shouldn't take a step that will increase that
> > pain even more, scattered over multiple places.
> > 2. I think we should try to help out the Calcite community more. Not only
> > by opening new PRs for new features, but we can also help by reviewing
> > those PRs, reviewing other PRs that could be relevant for Flink or
> propose
> > improvements given our experience at Flink. As you can see in the Calcite
> > thread, Timo has already expressed desire in doing so. Part of the OSS
> > community is also about helping each other; if we improve Calcite, we
> will
> > also improve Flink.
> > 3. I think we need to prioritise keeping up with the Calcite updates.
> They
> > are currently working on releasing version 1.31, while Flink is still at
> > 1.26.0. We don't necessarily need to stay in sync with the latest
> available
> > version, but I definitely think we should be at most 2 versions (and
> > preferably 1 version) behind (so currently that would be 1.28 and 1.29
> > soonish). Not only are we increasing our own tech debt by not updating,
> we
> > are also limiting ourselves in adding new features in the Table/SQL
> space.
> > As you can also see for the 1.26 release notes, there's a warning to only
> > use 1.26 for development since it can corrupt your data [3]. There are
> > already multiple upgrade tickets for Calcite [4] [5] [6].
> >
> > [1] https://lists.apache.org/thread/3lkfhwjpqwy9pfhnvwmfkwmwlfyqs45z
> > [2]
> >
> >
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA
> > [3] https://calcite.apache.org/news/2020/10/06/release-1.26.0/
> > [4] https://issues.apache.org/jira/browse/FLINK-20873
> > [5] https://issues.apache.org/jira/browse/FLINK-21239
> > [6] https://issues.apache.org/jira/browse/FLINK-27998
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> > Op do 5 mei 2022 om 10:34 schreef godfrey he <godfre...@gmail.com>:
> >
> > > Hi, Timo & Martijn,
> > >
> > > Sorry for the late reply, thanks for the feedback.
> > >
> > > I strongly agree that the best solution would be to cooperate more
> > > with the Calcite community
> > > and maintain all new features and bug fixes in the Calcite community,
> > > without any forking.
> > > It is a long-term process. I think it's difficult to change community
> > > rules, because the Calcite
> > > project is a neutral lib that serves multiple projects simultaneously.
> > > I don't think fork calcite is the perfect solution, but rather a
> > > better balance within limited resources:
> > > it's possible to introduce some necessary minor features and bug fixes
> > > without having to
> > > upgrade to the latest version.
> > >
> > >
> > > I investigate other projects that use Calcite[1] and find that most of
> > > them do not use
> > > the latest version of the Calcite. Even for the Kylin community, the
> > > version, based on
> > > Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark
> and
> > > drill)
> > > My guess is that these projects choosed a stable version,
> > > (or even choose to maintain a fork project), to maintain the stability.
> > > When Flink does not need to introduce new syntax anymore,
> > > I guess it's less expensive and more manageable to maintain a fork
> > Calcite.
> > >
> > >
> > > Even if we don't end up going the fork calcite route,
> > > I hope that we could discuss the options for subsequent calcite
> upgrades
> > > here.
> > > Just like Timo mentioned, how to balance feature development and code
> > > maintenance.
> > > There are a few realistic questions about the Calcite upgrade
> > > situation now, such as:
> > > 1. If we keep up with the latest version of Calcite, who is
> > > responsible for each upgrade?
> > > The current status is that no one has motivation to upgrade the version
> > > unless he/she wants to drive new features.
> > > 2. Do we have the resources/energy to upgrade each version?
> > > 3. How do we ensure that each upgrade is expected? It took a lot of
> > effort
> > > to
> > > verify the correctness of the upgrade results.The Test set for
> > > uncommon sql usage is not enough now.
> > >
> > >
> > > > I still don't quite understand why we want to avoid Calcite upgrades.
> > > Not every feature in Calcite is a feature we really need. While some
> > > refactorings can be very burdensome
> > > (lots of bugs, plan changes, and a lot of effort to fix).
> > > Just as mentioned above, the "SEARCH operator" refactoring in
> > > CALCITE-4173 did cause a lot of bugs.
> > >
> > >
> > > [1] https://calcite.apache.org/docs/powered_by.html
> > > [2]
> https://github.com/Kyligence/calcite/commits/kycalcite-1.16.0.x-4.x
> > >
> > > Best,
> > > Godfrey
> > >
> > > Martijn Visser <martijnvis...@apache.org> 于2022年4月25日周一 22:11写道:
> > >
> > > >
> > > > Hi all,
> > > >
> > > > Just a couple of remarks on some of things from this thread:
> > > >
> > > > > I think we will upgrade Calcite to 1.31 only when Flink depends on
> > some
> > > > significant features of Calcite.
> > > > > Such as: new syntax PTF (CALCITE-4865).
> > > >
> > > > Like Timo also mentions, I think this is a bad practice. Calcite is a
> > key
> > > > dependency for Flink. We should upgrade as often as possible, not as
> > > little
> > > > as possible. Any fork in the beginning is easy, but it becomes a
> bigger
> > > > pain as time progresses.
> > > >
> > > > > >## Are the calcite repository costly to maintain?
> > > > > From the experience of @Dann y chen (One PMC of Calcite),
> publishing
> > > > > is much easier.
> > > >
> > > > Since Calcite is such a key dependency, I would really oppose forking
> > it.
> > > > There will only be very few maintainers of such a fork. The amount of
> > > > people that know and can maintain both Calcite and Flink will be even
> > > less.
> > > >
> > > > > I'm just trying to find an approach which can avoid frequent
> Calcite
> > > > upgrades,
> > > > > but easily support bug fix and minor new feature development.
> > > >
> > > > I still don't quite understand why we want to avoid Calcite upgrades.
> > > > Upgrading Calcite introduces new features, but it also resolves bugs
> > that
> > > > currently exist in Flink. Part of housekeeping is that we keep our
> > > codebase
> > > > up-to-date and tidy, to avoid that it becomes a mess and
> > unmaintainable.
> > > I
> > > > understand that this is less preferred, because you can't spend this
> > time
> > > > working on new features. If I make a comparison with doing
> construction
> > > > work on your house, you can't put in a new floor if you don't clean
> out
> > > the
> > > > room first.
> > > >
> > > > > About Calcite version upgrading,  we should try not use the latest
> > > > Calcite version to avoid the bugs introduced by the new version if
> > > possible.
> > > >
> > > > I can fully agree on that. But right now we're running multiple
> > versions
> > > > behind.
> > > >
> > > > Have we reached out to the Calcite community first with our problems,
> > or
> > > > have we gone straight into "let's fork it"?
> > > >
> > > > I still haven't seen an argument that would make me in favor of
> setting
> > > up
> > > > a fork.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Mon, 25 Apr 2022 at 15:55, Timo Walther <twal...@apache.org>
> wrote:
> > > >
> > > > > Hi Godfrey,
> > > > >
> > > > > I'm also strictly against maintaining a Calcite fork. We had
> similar
> > > > > discussions during the merge of the Blink code base in the past and
> > I'm
> > > > > happy that we could prevent a fork until today. Let me elaborate a
> > bit
> > > > > on my strict opinion here:
> > > > >
> > > > > 1) Calcite does not offer bugfix releases
> > > > >
> > > > > In the end, also Calcite is an Apache community. I'm sure we could
> > > > > improve our collaboration and help releasing bugfix releases. So
> far
> > we
> > > > > were mostly leveraging all the stuff that the Calcite community has
> > > > > built. It would be good to strengthen the relation and also give
> > > > > something back.
> > > > >
> > > > > So far having no bugfix releases was not really a problem for the
> > Flink
> > > > > community. We simply copy over files from Calcite into Flink once a
> > bug
> > > > > has been merged in Calcite. Maven implicitly overwrites the
> original
> > > > > Calcite classes during artifact building. Most `org.apache.calcite`
> > > > > classes in the Flink code base are fixing bugs and wait for removal
> > > > > during the next Calcite upgrade.
> > > > >
> > > > > 2) Slow feature reviewing
> > > > >
> > > > > Slow feature reviewing has a good and a bad side. One of the
> reasons
> > > why
> > > > > it is so slow is because the maintainers pay a lot of attention to
> > > > > standard compliance, long-term code quality, and
> > > > > cross-downstream-projects usability. All of that is the reason why
> > the
> > > > > Calcite code base has last multiple decades already and is useful
> for
> > > > > many parties.
> > > > >
> > > > > Relying on Calcite has protected the Flink code base from merging
> > > > > non-standard SQL features and extending the SQL dialect too much.
> The
> > > 1.
> > > > > windows in Calcite and aux functions such as TUMBLE_START have
> shown
> > > > > that only standard compliant features should be merged. Now the
> Flink
> > > > > community has the problem of maintaining this custom syntax.
> > > > >
> > > > > 3) No compatibility guaranteed from the Calcite community
> > > > >
> > > > > I disagree here. Many changes are protected by keeping deprecated
> > > > > methods/constructors/classes around for years. And many refactoring
> > are
> > > > > nice also for the Flink community. E.g. easier optimizer rule
> > > definition.
> > > > >
> > > > > IMHO the core problem is rather that we don't update Calcite
> > frequently
> > > > > enough. Currently, we are lagging behind quite a bit because we
> don't
> > > > > pay enough resources in code maintenance but only in new feature
> > > > > development. We should spend some time in a better balance of the
> > two.
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > > Am 25.04.22 um 15:13 schrieb godfrey he:
> > > > > > Hi Jark,
> > > > > >
> > > > > > Agree with you, thanks for the feedback.
> > > > > >
> > > > > > Best,
> > > > > > Godfrey
> > > > > >
> > > > > > Jark Wu <imj...@gmail.com> 于2022年4月25日周一 13:02写道:
> > > > > >> Thanks, Godfrey, for starting this discussion,
> > > > > >>
> > > > > >> I understand the motivation behind it.
> > > > > >> No bugfix releases, slow feature reviewing, and no compatibility
> > > > > guaranteed
> > > > > >> are genuinely blocking the development of Flink SQL.
> > > > > >>
> > > > > >> I think a fork is the last choice before trying our best to
> > > cooperate
> > > > > with
> > > > > >> the Calcite community.
> > > > > >> But we shouldn't stop here if there is no progress. Therefore,
> I'm
> > > okay
> > > > > >> with maintaining a fork.
> > > > > >>
> > > > > >> However:
> > > > > >> 1) It should be a temporary solution. We should have a plan to
> > move
> > > > > back to
> > > > > >> the latest Calcite version at some point (e.g., pushing them to
> > > resolve
> > > > > the
> > > > > >> problems mentioned above).
> > > > > >>
> > > > > >> 2) If we maintain the fork in flink-extended, we should
> determine
> > a
> > > > > groupId
> > > > > >> for deploying to maven central. The community should have
> > > permission to
> > > > > >> deploy under the groupId.
> > > > > >>
> > > > > >> Best,
> > > > > >> Jark
> > > > > >>
> > > > > >>
> > > > > >> On Sun, 24 Apr 2022 at 16:14, godfrey he <godfre...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> Hi, Jing
> > > > > >>> Thanks for sharing the Calcite experiences.
> > > > > >>> About Calcite version upgrading,  we should try not use the
> > latest
> > > > > Calcite
> > > > > >>> version to avoid the bugs introduced by the new version if
> > > possible.
> > > > > >>> This may be a best practice.
> > > > > >>>
> > > > > >>>
> > > > > >>> Hi, Yun
> > > > > >>> Thanks for the detailed explanation for the experiences
> regarding
> > > > > FRocksDB.
> > > > > >>> I agree with you that the situation with Calcite and RocksDB
> is a
> > > > > >>> little difference.
> > > > > >>> The main pain point for Calcite is that we have to upgrade
> > Calcite
> > > to
> > > > > >>> latest version
> > > > > >>> to get fix bugs and new features, but the latest version may be
> > > > > >>> unstable, which is a pain for us.
> > > > > >>> If we all agree we should maintain a forked Calcite repo,
> > > > > >>> there are many experiences we can learn from FRocksDB.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Godfrey
> > > > > >>>
> > > > > >>> Yun Tang <myas...@live.com> 于2022年4月24日周日 11:58写道:
> > > > > >>>> Hi all,
> > > > > >>>>
> > > > > >>>> I could share two cents here for how we maintain FRocksDB.
> > > > > >>>>
> > > > > >>>> First of all, we also do not prefer to maintain a customized
> > > RocksDB
> > > > > >>> version in Flink, which brings additional overhead for Flink
> > > community:
> > > > > >>>>
> > > > > >>>>    1.  RocksDB community switches to circleci for the CI tests
> > > after
> > > > > >>> RocksDB-6.x, which requires additional money to run all tests
> for
> > > > > reviewing
> > > > > >>> each PR.
> > > > > >>>>    2.  We need to compile and include all kinds of FRocksDB
> > > binaries
> > > > > on
> > > > > >>> linux32/64, windows, ppc64, ARM and Macos platforms, which is
> > > really
> > > > > tough
> > > > > >>> and boring experiences.
> > > > > >>>> The root reason why we have to maintain a forked RocksDB repo
> is
> > > that
> > > > > >>> RocksDB community refuses to accept a plugin-like feature based
> > on
> > > > > >>> compaction filter, which is heavily dependent by Flink's state
> > TTL
> > > > > feature
> > > > > >>> [1]. From RocksDB-7.0, the community also moves several
> > components
> > > to
> > > > > the
> > > > > >>> plugin repo [2], although this cannot avoid us to release all
> > > kinds of
> > > > > >>> binaries, it can at least decrease our energy to maintain the
> > whole
> > > > > tests
> > > > > >>> if we follow this trend.
> > > > > >>>> Last but not least, I don't think current discussion on Apache
> > > Calcite
> > > > > >>> is in the same situation as FRocksDB. Current Flink SQL guys
> > > complain
> > > > > that
> > > > > >>> Calcite is released too slowly, which blocks some feature
> > > development
> > > > > in
> > > > > >>> Flink. However, RocksDB community itself actually release new
> > > versions
> > > > > more
> > > > > >>> frequently, and we don't rely on its new version for some new
> > > features
> > > > > >>> currently. Moreover, we're often more careful on upgrading
> > > underlying
> > > > > >>> storage component as it could impact the performance and data
> > > > > correctness.
> > > > > >>>>
> > > > > >>>> [1]
> > > > > >>>
> > > > >
> > >
> >
> https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786
> > > > > >>>> [2] https://github.com/facebook/rocksdb/issues/9390
> > > > > >>>>
> > > > > >>>> Best
> > > > > >>>> Yun Tang
> > > > > >>>>
> > > > > >>>> ________________________________
> > > > > >>>> From: Jing Zhang <beyond1...@gmail.com>
> > > > > >>>> Sent: Saturday, April 23, 2022 15:21
> > > > > >>>> To: dev <dev@flink.apache.org>
> > > > > >>>> Cc: Yun Tang <myas...@live.com>
> > > > > >>>> Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink
> > to
> > > > > >>> accelerate the development for Flink SQL features
> > > > > >>>> Hi Godfrey,
> > > > > >>>> I would like to share some problems based on my past
> experience.
> > > > > >>>> 1.  It's not easy to push new features in the CALCITE
> community.
> > > > > >>>> As @Martijn referred,
> > > > > https://issues.apache.org/jira/browse/CALCITE-4865
> > > > > >>> /
> > > > > >>>> https://github.com/apache/calcite/pull/2606 is such an
> example.
> > > > > >>>> I tried out many ways, for example, sent review requests in
> the
> > > dev
> > > > > mail
> > > > > >>> list, left comments in JIRA and in pull requests.
> > > > > >>>> And had to give up finally. Sorry for that.
> > > > > >>>> 2. However,  some new features of calcite are radical.
> > > > > >>>> Such as https://issues.apache.org/jira/browse/CALCITE-4173,
> > > which had
> > > > > >>> some strong opposition in the CALCITE community,
> > > > > >>>> But it was merged finally and caused  unexpected problems,
> such
> > as
> > > > > wrong
> > > > > >>> results (https://issues.apache.org/jira/browse/FLINK-24708)
> > > > > >>>> and other related bugs.
> > > > > >>>> 3. Every time we upgrade the calcite version, we will cross
> > > multiple
> > > > > >>> versions, resulting in a slow upgrade process and
> > > > > >>>> uncontrolled results, often causing some unexpected problems.
> > > > > >>>>
> > > > > >>>> Thank @Godfrey for driving this discussion in a big scope.
> > > > > >>>> I think it's a good chance to review these problems and find a
> > > > > solution.
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Jing Zhang
> > > > > >>>>
> > > > > >>>> godfrey he <godfre...@gmail.com<mailto:godfre...@gmail.com>>
> > > > > >>> 于2022年4月22日周五 21:40写道:
> > > > > >>>> Hi Chesnay,
> > > > > >>>>
> > > > > >>>> There is no bug fix version until now.
> > > > > >>>> You can find the releases in
> > > https://github.com/apache/calcite/tags
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Godfrey
> > > > > >>>>
> > > > > >>>> Chesnay Schepler <ches...@apache.org<mailto:
> ches...@apache.org
> > >>
> > > > > >>> 于2022年4月22日周五 18:48写道:
> > > > > >>>>> I find it a bit weird that the supposed only way to get a bug
> > > fix is
> > > > > to
> > > > > >>>>> do a big version upgrade.
> > > > > >>>>> Is Calcite not creating bugfix releases?
> > > > > >>>>>
> > > > > >>>>> On 22/04/2022 12:26, godfrey he wrote:
> > > > > >>>>>> Thanks for the feedback, guys!
> > > > > >>>>>>
> > > > > >>>>>> For Jingsong's feedback:
> > > > > >>>>>>> ## Do we have the plan to upgrade calcite to 1.31?
> > > > > >>>>>> I think we will upgrade Calcite to 1.31 only when Flink
> > depends
> > > on
> > > > > >>>>>> some significant features of Calcite.
> > > > > >>>>>>    Such as: new syntax PTF (CALCITE-4865).
> > > > > >>>>>>
> > > > > >>>>>>    >## Is Cherry-pick costly?
> > > > > >>>>>> >From the experience of maintaining calcite with our
> company,
> > > the
> > > > > >>> cost is small.
> > > > > >>>>>> We only cherry-pick the bug fixes and needed minor features.
> > > > > >>>>>> For a major feature, we can choose to upgrade the version.
> > > > > >>>>>>
> > > > > >>>>>>> ## Are the calcite repository costly to maintain?
> > > > > >>>>>> >From the experience of @Dann y chen (One PMC of Calcite),
> > > > > publishing
> > > > > >>>>>> is much easier.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> For Chesnay's feedback:
> > > > > >>>>>> I also totally agree that a fork repository will increase
> the
> > > cost
> > > > > of
> > > > > >>>>>> maintenance.
> > > > > >>>>>>
> > > > > >>>>>> Usually, the Calcite community releases a version three
> months
> > > or
> > > > > >>> more.
> > > > > >>>>>> I think it's hard to let Calcite change the release cycle
> > > > > >>>>>> because Calcite supports many compute engines.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> For Konstantin's feedback:
> > > > > >>>>>> Some changes in Calcite may cause hundreds of plan changes
> in
> > > Flink,
> > > > > >>>>>> such as: CALCITE-4173.
> > > > > >>>>>> We must check whether the change is expected, whether there
> is
> > > > > >>>>>> performance regression.
> > > > > >>>>>> Some of the changes are very subtle, especially in the CBO
> > > planner.
> > > > > >>>>>> This situation also occurs similarly within upgrading from
> > 1.1x
> > > to
> > > > > >>> 1.22.
> > > > > >>>>>> If you are not familiar with Flink planner and Calcite, it
> > will
> > > be
> > > > > >>>>>> more difficult to upgrade.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> For Xintong's feedback:
> > > > > >>>>>> You are right, I will connect Yun for some help, Thanks for
> > the
> > > > > >>> suggestions.
> > > > > >>>>>>
> > > > > >>>>>> For Martijn's feedback:
> > > > > >>>>>> I'm also against cherry-pick many features code into the
> fock
> > > > > >>> repository,
> > > > > >>>>>> and I also totally agree we should collaborate closely with
> > the
> > > > > >>>>>> Calcite community.
> > > > > >>>>>> I'm just trying to find an approach which can avoid frequent
> > > Calcite
> > > > > >>> upgrades,
> > > > > >>>>>> but easily support bug fix and minor new feature
> development.
> > > > > >>>>>>
> > > > > >>>>>> As for the CALCITE-4865 case, I think we should upgrade the
> > > Calcite
> > > > > >>>>>> version to support PTF.
> > > > > >>>>>>
> > > > > >>>>>> @Jing zhang, can you share some 'feeling' for CALCITE-4865 ?
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Godfrey
> > > > > >>>>>>
> > > > > >>>>>> Martijn Visser <martijnvis...@apache.org<mailto:
> > > > > >>> martijnvis...@apache.org>> 于2022年4月22日周五 17:31写道:
> > > > > >>>>>>> Hi everyone,
> > > > > >>>>>>>
> > > > > >>>>>>> Overall I'm against the idea of setting up a Calcite fork
> for
> > > the
> > > > > >>> same
> > > > > >>>>>>> reasons that Chesnay has mentioned. We've talked
> extensively
> > > about
> > > > > >>> doing an
> > > > > >>>>>>> upgrade of Calcite during the Flink 1.15 release period,
> but
> > > there
> > > > > >>> was a
> > > > > >>>>>>> lot of pushback by the maintainers against that because of
> > the
> > > > > >>> required
> > > > > >>>>>>> efforts. Having our own fork will mean that there will be
> > even
> > > more
> > > > > >>> effort
> > > > > >>>>>>> required, because not only do we need to perform the
> upgrade
> > on
> > > > > >>> Flink's
> > > > > >>>>>>> end, we also need to maintain this Calcite fork.
> > > > > >>>>>>>
> > > > > >>>>>>> I think what we should do is have a closer collaboration
> with
> > > the
> > > > > >>> Calcite
> > > > > >>>>>>> community and see if we can also help out with
> > > reviewing/merging
> > > > > >>> PRs and
> > > > > >>>>>>> more frequent releases. What we're seeing is that already
> > > features
> > > > > >>> that are
> > > > > >>>>>>> proposed towards Calcite because we need them for Flink,
> are
> > > not
> > > > > >>> getting
> > > > > >>>>>>> picked up by the Calcite community. See
> > > > > >>>>>>> https://issues.apache.org/jira/browse/CALCITE-4865 /
> > > > > >>>>>>> https://github.com/apache/calcite/pull/2606 which is such
> an
> > > > > >>> example.
> > > > > >>>>>>> I would rather invest more in collaborating with the
> Calcite
> > > > > >>> community
> > > > > >>>>>>> instead of maintaining our own fork. I believe that would
> > help
> > > us
> > > > > >>> get new
> > > > > >>>>>>> features and bug fixes sooner.
> > > > > >>>>>>>
> > > > > >>>>>>> Best regards,
> > > > > >>>>>>>
> > > > > >>>>>>> Martijn Visser
> > > > > >>>>>>> https://twitter.com/MartijnVisser82
> > > > > >>>>>>> https://github.com/MartijnVisser
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, 22 Apr 2022 at 10:46, Xintong Song <
> > > tonysong...@gmail.com
> > > > > >>> <mailto:tonysong...@gmail.com>> wrote:
> > > > > >>>>>>>> BTW, I think this proposal sounds similar to FRocksDB, the
> > > Flink's
> > > > > >>> custom
> > > > > >>>>>>>> RocksDB build. Maybe folks maintaining FRocksDB can share
> > some
> > > > > >>> experiences.
> > > > > >>>>>>>> CC @Yun Tang
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thank you~
> > > > > >>>>>>>>
> > > > > >>>>>>>> Xintong Song
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <
> > > > > >>> tonysong...@gmail.com<mailto:tonysong...@gmail.com>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Godfrey,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> 1. Where to put the code?
> > https://github.com/flink-extended
> > > is
> > > > > >>> a good
> > > > > >>>>>>>>>> place.
> > > > > >>>>>>>>> Please notice that `flink-extended` is not endorsed by
> the
> > > Apache
> > > > > >>> Flink
> > > > > >>>>>>>>> PMC. That means if the proposed new Calcite repository is
> > > hosted
> > > > > >>> there,
> > > > > >>>>>>>> the
> > > > > >>>>>>>>> maintenance and release will not be guaranteed by the
> > Apache
> > > > > Flink
> > > > > >>>>>>>> project.
> > > > > >>>>>>>>> I guess the question is do we consider another 3rd party
> > > Calcite
> > > > > >>>>>>>> repository
> > > > > >>>>>>>>> more reliable and convenient than the official Apache
> > Calcite
> > > > > >>> that we
> > > > > >>>>>>>> want
> > > > > >>>>>>>>> to depend on.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thank you~
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Xintong Song
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <
> > > > > >>> ches...@apache.org<mailto:ches...@apache.org>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> I'm overall against the idea of creating a fork.
> > > > > >>>>>>>>>> It implies quite some maintenance overhead, like dealing
> > > with
> > > > > >>> unstable
> > > > > >>>>>>>>>> tests, CI, licensing etc. and the overall release
> > overhead.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Is there no alternative where we can collaborate more
> with
> > > the
> > > > > >>> calcite
> > > > > >>>>>>>>>> guys, like verifying new features so bugs are caught
> > sooner?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On 22/04/2022 09:31, godfrey he wrote:
> > > > > >>>>>>>>>>> Dear devs,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I would like to open a discussion on the fact that
> > > currently
> > > > > >>> many
> > > > > >>>>>>>>>>> Flink SQL function
> > > > > >>>>>>>>>>>     development relies on Calcite releases, which
> > seriously
> > > > > >>> blocks some
> > > > > >>>>>>>>>>> Flink SQL's features release.
> > > > > >>>>>>>>>>> Therefore, I would like to discuss whether it is
> possible
> > > to
> > > > > >>> solve
> > > > > >>>>>>>> this
> > > > > >>>>>>>>>> problem
> > > > > >>>>>>>>>>> by creating Flink's own Calcite repository.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Currently, Flink depends on Caclite-1.26, FLIP-204[1]
> > > relies on
> > > > > >>>>>>>>>> Calcite-1.30,
> > > > > >>>>>>>>>>> and we recently want to support fully join-hints
> > > functionatity
> > > > > >>> in
> > > > > >>>>>>>>>> Flink-1.16,
> > > > > >>>>>>>>>>> which relies on Calcite-1.31 (maybe two or three months
> > > later
> > > > > >>> will be
> > > > > >>>>>>>>>> released).
> > > > > >>>>>>>>>>> In order to support some new features or fix some bugs,
> > we
> > > need
> > > > > >>> to
> > > > > >>>>>>>>>> upgrade
> > > > > >>>>>>>>>>> the Calcite version, but every time we upgrade Calcite
> > > version
> > > > > >>>>>>>>>>> (especially upgrades
> > > > > >>>>>>>>>>> across multiple versions), the processing is very
> tough:
> > I
> > > > > >>> remember
> > > > > >>>>>>>>>> clearly that
> > > > > >>>>>>>>>>>     the Calcite upgrade from 1.22 to 1.26 took two
> weeks
> > of
> > > > > >>> full-time to
> > > > > >>>>>>>>>> complete.
> > > > > >>>>>>>>>>> Currently, in order to fix some bugs while not
> upgrading
> > > the
> > > > > >>> Calcite
> > > > > >>>>>>>>>> version,
> > > > > >>>>>>>>>>> we copy the corresponding Calcite class directly into
> the
> > > Flink
> > > > > >>>>>>>> project
> > > > > >>>>>>>>>>> and then modify it accordingly.[2] This approach is
> > rather
> > > > > >>> hacky and
> > > > > >>>>>>>>>>> hard for code maintenance and upgrades.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> So, I had an idea whether we could solve this problem
> by
> > > > > >>> maintaining a
> > > > > >>>>>>>>>>> Calcite repository
> > > > > >>>>>>>>>>> in the Flink community. This approach has been
> practiced
> > > within
> > > > > >>> my
> > > > > >>>>>>>>>>> company for many years.
> > > > > >>>>>>>>>>>     There are similar practices in the industry. For
> > > example,
> > > > > >>> Apache
> > > > > >>>>>>>> Dill
> > > > > >>>>>>>>>>> also maintains
> > > > > >>>>>>>>>>> a separate Calcite repository[3].
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> The following is a brief analysis of the approach and
> the
> > > pros
> > > > > >>> and
> > > > > >>>>>>>>>>> cons of maintaining a separate repository.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Approach:
> > > > > >>>>>>>>>>> 1. Where to put the code?
> > > https://github.com/flink-extended is
> > > > > >>> a good
> > > > > >>>>>>>>>> place.
> > > > > >>>>>>>>>>> 2. What extra code can be added to this repository?
> Only
> > > bug
> > > > > >>> fixes and
> > > > > >>>>>>>>>> features
> > > > > >>>>>>>>>>> that are already merged into Calcite can be
> cherry-picked
> > > to
> > > > > >>> this
> > > > > >>>>>>>>>> repository.
> > > > > >>>>>>>>>>> We also should try to push bug fixes to the Calcite
> > > community.
> > > > > >>>>>>>>>>> Btw, the copied Calcite class in the Flink project can
> be
> > > > > >>> removed.
> > > > > >>>>>>>>>>> 3. How to upgrade the Calcite version? Check out the
> > target
> > > > > >>> Calcite
> > > > > >>>>>>>>>>> release branch
> > > > > >>>>>>>>>>> and rebase our bug fix code. (As we upgrade, we will
> > > maintain
> > > > > >>> fewer
> > > > > >>>>>>>>>>> and fewer older bug
> > > > > >>>>>>>>>>> fixes code.) And then, verify all Calcte's tests and
> > > Flink's
> > > > > >>> tests in
> > > > > >>>>>>>>>>> the developer's local
> > > > > >>>>>>>>>>>     environment. If all tests are OK, release the
> Calcite
> > > > > >>> branch, or fix
> > > > > >>>>>>>>>>> it in the branch and re-test.
> > > > > >>>>>>>>>>>     After the branch is released, then the version of
> > > Calcite
> > > > > in
> > > > > >>> Flink
> > > > > >>>>>>>>>>> can be upgraded. For example:
> > > > > >>>>>>>>>>>     checkout calcite-1.26.0-flink-v1-SNAPSHOT branch
> from
> > > > > >>>>>>>> calcite-1.26.0,
> > > > > >>>>>>>>>>> move all the copied
> > > > > >>>>>>>>>>>     Calcite code in Flink to the branch, and pick all
> the
> > > hint
> > > > > >>> related
> > > > > >>>>>>>>>>> changes from Calcite-1.31 to
> > > > > >>>>>>>>>>>     the branch. Then we can change the Calcite version
> in
> > > Flink
> > > > > >>> to
> > > > > >>>>>>>>>>> calcite-1.26.0-flink-v1-SNAPSHOT,
> > > > > >>>>>>>>>>> and verify all tests in the locale. Release
> > > > > >>> calcite-1.26.0-flink-v1
> > > > > >>>>>>>>>>> after all tests are successful.
> > > > > >>>>>>>>>>> At last upgrade the calcite version to
> > > > > >>>>>>>>>>> calcite-1.26.0-flink-v10-flink-v1, and open a PR.
> > > > > >>>>>>>>>>> 4. Who will maintain it? The maintenance workload is
> > > minimal,
> > > > > >>> but the
> > > > > >>>>>>>>>>> upgrade work is
> > > > > >>>>>>>>>>>     laborious (actually, it's similar to before). I can
> > > > > maintain
> > > > > >>> it in
> > > > > >>>>>>>>>>> the early stage and standardise the processing.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Pros.
> > > > > >>>>>>>>>>> 1. The release of Flink is decoupled from the release
> of
> > > > > >>> Calcite,
> > > > > >>>>>>>>>>>     making feature development and bug fix quicker
> > > > > >>>>>>>>>>> 2. Reduce the hassle of unnecessary calcite upgrades
> > > > > >>>>>>>>>>> 3. No hacking in Flink to maintain the Calcite copied
> > code
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> cons.
> > > > > >>>>>>>>>>> 1. Need to maintain an additional Calcite repository
> > > > > >>>>>>>>>>> 2. The Upgrades are a little more complicated than
> before
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Any feedback is very welcome!
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [1]
> > > > > >>>
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > > > > >>>>>>>>>>> [2]
> > > > > >>>
> > > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
> > > > > >>>>>>>>>>> [3]
> > > https://github.com/apache/drill/blob/master/pom.xml#L64
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Best,
> > > > > >>>>>>>>>>> Godfrey
> > > > > >>>>>>>>>>
> > > > >
> > > > >
> > >
> >
>

Reply via email to