Hi Godfrey,

I'm also strictly against maintaining a Calcite fork. We had similar discussions during the merge of the Blink code base in the past and I'm happy that we could prevent a fork until today. Let me elaborate a bit on my strict opinion here:

1) Calcite does not offer bugfix releases

In the end, also Calcite is an Apache community. I'm sure we could improve our collaboration and help releasing bugfix releases. So far we were mostly leveraging all the stuff that the Calcite community has built. It would be good to strengthen the relation and also give something back.

So far having no bugfix releases was not really a problem for the Flink community. We simply copy over files from Calcite into Flink once a bug has been merged in Calcite. Maven implicitly overwrites the original Calcite classes during artifact building. Most `org.apache.calcite` classes in the Flink code base are fixing bugs and wait for removal during the next Calcite upgrade.

2) Slow feature reviewing

Slow feature reviewing has a good and a bad side. One of the reasons why it is so slow is because the maintainers pay a lot of attention to standard compliance, long-term code quality, and cross-downstream-projects usability. All of that is the reason why the Calcite code base has last multiple decades already and is useful for many parties.

Relying on Calcite has protected the Flink code base from merging non-standard SQL features and extending the SQL dialect too much. The 1. windows in Calcite and aux functions such as TUMBLE_START have shown that only standard compliant features should be merged. Now the Flink community has the problem of maintaining this custom syntax.

3) No compatibility guaranteed from the Calcite community

I disagree here. Many changes are protected by keeping deprecated methods/constructors/classes around for years. And many refactoring are nice also for the Flink community. E.g. easier optimizer rule definition.

IMHO the core problem is rather that we don't update Calcite frequently enough. Currently, we are lagging behind quite a bit because we don't pay enough resources in code maintenance but only in new feature development. We should spend some time in a better balance of the two.

Regards,
Timo

Am 25.04.22 um 15:13 schrieb godfrey he:
Hi Jark,

Agree with you, thanks for the feedback.

Best,
Godfrey

Jark Wu <imj...@gmail.com> 于2022年4月25日周一 13:02写道:
Thanks, Godfrey, for starting this discussion,

I understand the motivation behind it.
No bugfix releases, slow feature reviewing, and no compatibility guaranteed
are genuinely blocking the development of Flink SQL.

I think a fork is the last choice before trying our best to cooperate with
the Calcite community.
But we shouldn't stop here if there is no progress. Therefore, I'm okay
with maintaining a fork.

However:
1) It should be a temporary solution. We should have a plan to move back to
the latest Calcite version at some point (e.g., pushing them to resolve the
problems mentioned above).

2) If we maintain the fork in flink-extended, we should determine a groupId
for deploying to maven central. The community should have permission to
deploy under the groupId.

Best,
Jark


On Sun, 24 Apr 2022 at 16:14, godfrey he <godfre...@gmail.com> wrote:

Hi, Jing
Thanks for sharing the Calcite experiences.
About Calcite version upgrading,  we should try not use the latest Calcite
version to avoid the bugs introduced by the new version if possible.
This may be a best practice.


Hi, Yun
Thanks for the detailed explanation for the experiences regarding FRocksDB.
I agree with you that the situation with Calcite and RocksDB is a
little difference.
The main pain point for Calcite is that we have to upgrade Calcite to
latest version
to get fix bugs and new features, but the latest version may be
unstable, which is a pain for us.
If we all agree we should maintain a forked Calcite repo,
there are many experiences we can learn from FRocksDB.

Best,
Godfrey

Yun Tang <myas...@live.com> 于2022年4月24日周日 11:58写道:
Hi all,

I could share two cents here for how we maintain FRocksDB.

First of all, we also do not prefer to maintain a customized RocksDB
version in Flink, which brings additional overhead for Flink community:

   1.  RocksDB community switches to circleci for the CI tests after
RocksDB-6.x, which requires additional money to run all tests for reviewing
each PR.
   2.  We need to compile and include all kinds of FRocksDB binaries on
linux32/64, windows, ppc64, ARM and Macos platforms, which is really tough
and boring experiences.
The root reason why we have to maintain a forked RocksDB repo is that
RocksDB community refuses to accept a plugin-like feature based on
compaction filter, which is heavily dependent by Flink's state TTL feature
[1]. From RocksDB-7.0, the community also moves several components to the
plugin repo [2], although this cannot avoid us to release all kinds of
binaries, it can at least decrease our energy to maintain the whole tests
if we follow this trend.
Last but not least, I don't think current discussion on Apache Calcite
is in the same situation as FRocksDB. Current Flink SQL guys complain that
Calcite is released too slowly, which blocks some feature development in
Flink. However, RocksDB community itself actually release new versions more
frequently, and we don't rely on its new version for some new features
currently. Moreover, we're often more careful on upgrading underlying
storage component as it could impact the performance and data correctness.

[1]
https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786
[2] https://github.com/facebook/rocksdb/issues/9390

Best
Yun Tang

________________________________
From: Jing Zhang <beyond1...@gmail.com>
Sent: Saturday, April 23, 2022 15:21
To: dev <dev@flink.apache.org>
Cc: Yun Tang <myas...@live.com>
Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink to
accelerate the development for Flink SQL features
Hi Godfrey,
I would like to share some problems based on my past experience.
1.  It's not easy to push new features in the CALCITE community.
As @Martijn referred, https://issues.apache.org/jira/browse/CALCITE-4865
/
https://github.com/apache/calcite/pull/2606 is such an example.
I tried out many ways, for example, sent review requests in the dev mail
list, left comments in JIRA and in pull requests.
And had to give up finally. Sorry for that.
2. However,  some new features of calcite are radical.
Such as https://issues.apache.org/jira/browse/CALCITE-4173, which had
some strong opposition in the CALCITE community,
But it was merged finally and caused  unexpected problems, such as wrong
results (https://issues.apache.org/jira/browse/FLINK-24708)
and other related bugs.
3. Every time we upgrade the calcite version, we will cross multiple
versions, resulting in a slow upgrade process and
uncontrolled results, often causing some unexpected problems.

Thank @Godfrey for driving this discussion in a big scope.
I think it's a good chance to review these problems and find a solution.

Best,
Jing Zhang

godfrey he <godfre...@gmail.com<mailto:godfre...@gmail.com>>
于2022年4月22日周五 21:40写道:
Hi Chesnay,

There is no bug fix version until now.
You can find the releases in https://github.com/apache/calcite/tags

Best,
Godfrey

Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org>>
于2022年4月22日周五 18:48写道:
I find it a bit weird that the supposed only way to get a bug fix is to
do a big version upgrade.
Is Calcite not creating bugfix releases?

On 22/04/2022 12:26, godfrey he wrote:
Thanks for the feedback, guys!

For Jingsong's feedback:
## Do we have the plan to upgrade calcite to 1.31?
I think we will upgrade Calcite to 1.31 only when Flink depends on
some significant features of Calcite.
   Such as: new syntax PTF (CALCITE-4865).

   >## Is Cherry-pick costly?
>From the experience of maintaining calcite with our company, the
cost is small.
We only cherry-pick the bug fixes and needed minor features.
For a major feature, we can choose to upgrade the version.

## Are the calcite repository costly to maintain?
>From the experience of @Dann y chen (One PMC of Calcite), publishing
is much easier.


For Chesnay's feedback:
I also totally agree that a fork repository will increase the cost of
maintenance.

Usually, the Calcite community releases a version three months or
more.
I think it's hard to let Calcite change the release cycle
because Calcite supports many compute engines.


For Konstantin's feedback:
Some changes in Calcite may cause hundreds of plan changes in Flink,
such as: CALCITE-4173.
We must check whether the change is expected, whether there is
performance regression.
Some of the changes are very subtle, especially in the CBO planner.
This situation also occurs similarly within upgrading from 1.1x to
1.22.
If you are not familiar with Flink planner and Calcite, it will be
more difficult to upgrade.


For Xintong's feedback:
You are right, I will connect Yun for some help, Thanks for the
suggestions.

For Martijn's feedback:
I'm also against cherry-pick many features code into the fock
repository,
and I also totally agree we should collaborate closely with the
Calcite community.
I'm just trying to find an approach which can avoid frequent Calcite
upgrades,
but easily support bug fix and minor new feature development.

As for the CALCITE-4865 case, I think we should upgrade the Calcite
version to support PTF.

@Jing zhang, can you share some 'feeling' for CALCITE-4865 ?

Best,
Godfrey

Martijn Visser <martijnvis...@apache.org<mailto:
martijnvis...@apache.org>> 于2022年4月22日周五 17:31写道:
Hi everyone,

Overall I'm against the idea of setting up a Calcite fork for the
same
reasons that Chesnay has mentioned. We've talked extensively about
doing an
upgrade of Calcite during the Flink 1.15 release period, but there
was a
lot of pushback by the maintainers against that because of the
required
efforts. Having our own fork will mean that there will be even more
effort
required, because not only do we need to perform the upgrade on
Flink's
end, we also need to maintain this Calcite fork.

I think what we should do is have a closer collaboration with the
Calcite
community and see if we can also help out with reviewing/merging
PRs and
more frequent releases. What we're seeing is that already features
that are
proposed towards Calcite because we need them for Flink, are not
getting
picked up by the Calcite community. See
https://issues.apache.org/jira/browse/CALCITE-4865 /
https://github.com/apache/calcite/pull/2606 which is such an
example.
I would rather invest more in collaborating with the Calcite
community
instead of maintaining our own fork. I believe that would help us
get new
features and bug fixes sooner.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Fri, 22 Apr 2022 at 10:46, Xintong Song <tonysong...@gmail.com
<mailto:tonysong...@gmail.com>> wrote:
BTW, I think this proposal sounds similar to FRocksDB, the Flink's
custom
RocksDB build. Maybe folks maintaining FRocksDB can share some
experiences.
CC @Yun Tang

Thank you~

Xintong Song



On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <
tonysong...@gmail.com<mailto:tonysong...@gmail.com>>
wrote:

Hi Godfrey,


1. Where to put the code? https://github.com/flink-extended is
a good
place.
Please notice that `flink-extended` is not endorsed by the Apache
Flink
PMC. That means if the proposed new Calcite repository is hosted
there,
the
maintenance and release will not be guaranteed by the Apache Flink
project.
I guess the question is do we consider another 3rd party Calcite
repository
more reliable and convenient than the official Apache Calcite
that we
want
to depend on.

Thank you~

Xintong Song



On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <
ches...@apache.org<mailto:ches...@apache.org>>
wrote:

I'm overall against the idea of creating a fork.
It implies quite some maintenance overhead, like dealing with
unstable
tests, CI, licensing etc. and the overall release overhead.

Is there no alternative where we can collaborate more with the
calcite
guys, like verifying new features so bugs are caught sooner?

On 22/04/2022 09:31, godfrey he wrote:
Dear devs,

I would like to open a discussion on the fact that currently
many
Flink SQL function
    development relies on Calcite releases, which seriously
blocks some
Flink SQL's features release.
Therefore, I would like to discuss whether it is possible to
solve
this
problem
by creating Flink's own Calcite repository.

Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on
Calcite-1.30,
and we recently want to support fully join-hints functionatity
in
Flink-1.16,
which relies on Calcite-1.31 (maybe two or three months later
will be
released).
In order to support some new features or fix some bugs, we need
to
upgrade
the Calcite version, but every time we upgrade Calcite version
(especially upgrades
across multiple versions), the processing is very tough: I
remember
clearly that
    the Calcite upgrade from 1.22 to 1.26 took two weeks of
full-time to
complete.
Currently, in order to fix some bugs while not upgrading the
Calcite
version,
we copy the corresponding Calcite class directly into the Flink
project
and then modify it accordingly.[2] This approach is rather
hacky and
hard for code maintenance and upgrades.

So, I had an idea whether we could solve this problem by
maintaining a
Calcite repository
in the Flink community. This approach has been practiced within
my
company for many years.
    There are similar practices in the industry. For example,
Apache
Dill
also maintains
a separate Calcite repository[3].

The following is a brief analysis of the approach and the pros
and
cons of maintaining a separate repository.

Approach:
1. Where to put the code? https://github.com/flink-extended is
a good
place.
2. What extra code can be added to this repository? Only bug
fixes and
features
that are already merged into Calcite can be cherry-picked to
this
repository.
We also should try to push bug fixes to the Calcite community.
Btw, the copied Calcite class in the Flink project can be
removed.
3. How to upgrade the Calcite version? Check out the target
Calcite
release branch
and rebase our bug fix code. (As we upgrade, we will maintain
fewer
and fewer older bug
fixes code.) And then, verify all Calcte's tests and Flink's
tests in
the developer's local
    environment. If all tests are OK, release the Calcite
branch, or fix
it in the branch and re-test.
    After the branch is released, then the version of Calcite in
Flink
can be upgraded. For example:
    checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from
calcite-1.26.0,
move all the copied
    Calcite code in Flink to the branch, and pick all the hint
related
changes from Calcite-1.31 to
    the branch. Then we can change the Calcite version in Flink
to
calcite-1.26.0-flink-v1-SNAPSHOT,
and verify all tests in the locale. Release
calcite-1.26.0-flink-v1
after all tests are successful.
At last upgrade the calcite version to
calcite-1.26.0-flink-v10-flink-v1, and open a PR.
4. Who will maintain it? The maintenance workload is minimal,
but the
upgrade work is
    laborious (actually, it's similar to before). I can maintain
it in
the early stage and standardise the processing.

Pros.
1. The release of Flink is decoupled from the release of
Calcite,
    making feature development and bug fix quicker
2. Reduce the hassle of unnecessary calcite upgrades
3. No hacking in Flink to maintain the Calcite copied code

cons.
1. Need to maintain an additional Calcite repository
2. The Upgrades are a little more complicated than before

Any feedback is very welcome!


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
[2]
https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
[3] https://github.com/apache/drill/blob/master/pom.xml#L64

Best,
Godfrey


Reply via email to