Hello all, FWIW, If a committer/reviewer shortage is the issue, I'd second Stamatis's recommendation. Best, -- C
> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis <[email protected]> wrote: > > Hi all, > > How about granting Calcite committership to people who are already ASF > committers (in other projects) and they have a proven record of working > with Calcite? > > Usually the PMC invites people to become committers to the project after > having a few successful code contributions in Calcite/Avatica repos. > This is to ensure that people are familiar with the codebase and understand > how the ASF works. > > People who are already committers in an ASF project already know how the > foundation works and how they should behave. > Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix, > etc., may already be quite familiar with Calcite if they have worked on the > query processing layer of the system. > > It might be difficult for the Calcite PMC to identify people familiar with > Calcite if they don't contribute to the main Calcite/Avatica repos > regularly thus I would be open to consider people for committers on a per > request basis. > > Example: > Bob is an ASF committer in Flink and he has pushed various contributions > around Calcite in the Flink repo. > Bob feels confident about fixing trivial things in Calcite and he wants to > help with reviewing and merging open PRs. > Bob sends an email to private@calcite list requesting to become a Calcite > committer. > Bob explains in the email who he is and what he has done to demonstrate he > is familiar with the Calcite code. > The Calcite PMC acknowledges the request and starts a vote for granting > Calcite comittership to Bob. > The Calcite PMC informs Bob about their decision and takes further actions > if necessary. > > If we agree on the overall idea we can figure out the details and formalize > the request process in our docs. > > Best, > Stamatis > > On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang <[email protected]> wrote: > >> Hi everyone, >> >> This is an awesome discussion to improve collaborating between different >> projects. >> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen. >> >> Best, >> Jing Zhang >> >> Martijn Visser <[email protected]> 于2022年6月23日周四 01:43写道: >> >>> Hi Jacques, Julian, Austin and everyone else, >>> >>> Thank you very much for sharing all your experiences and providing really >>> valuable input. I'll definitely relay this back to the original >> discussion >>> thread in the Flink community. Part of bringing this information back to >>> the Flink community is also because I feel like the only way that >> different >>> OSS solutions can help each other forward is by communicating and >>> collaborating. As Timo already mentioned, he'll try to help out. Let's >> try >>> to get some more involved. >>> >>> Side note: I also saw that this thread got some traction on Twitter [1] >> on >>> the cost of forking. >>> >>> Best regards, >>> >>> Martijn >>> >>> [1] >>> >>> >> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA >>> >>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <[email protected]>: >>> >>>> Hi everyone, >>>> >>>> This is a really great discussion. Thanks for starting it Martijn and >>>> your input Jacques! I have been fighting against forking Calcite in >>>> Flink for years already. Even when merging forks of Flink that >>>> transitively forked Calcite, in the end we were able to resolve >>>> conflicts / contribute blockers back into Calcite. And I strongly >>>> believe that this is the better approach for long-term success for both >>>> projects. >>>> >>>> I would like to get more involved in the Calcite community. I have been >>>> implementing and managing Flink SQL based on Calcite since 2016. Thus, >> I >>>> feel confident to say that I know the code base and some quirks in the >>>> stack very well. >>>> >>>> Capacity-wise I will try to reserve some time for helping the Calcite >>>> community. Happy to get some pointers where and how I can help. >>>> >>>> I will take a look at https://github.com/apache/calcite/pull/2606 this >>>> week to get the ball rolling. As this is an important addition and >>>> prepares for "customer SQL operators" in Flink SQL. >>>> >>>> Regards, >>>> Timo >>>> >>>> On 21.06.22 22:18, Charles Givre wrote: >>>>> As the PMC for Apache Drill, I'd echo everyone's comments here.... >>> Don't >>>> fork. Don't do it. >>>>> >>>>> Apache Drill forked Calcite several years ago which Calcite was on >>>> version 1.20 or 1.21. While this meant that some bugs were easily >> fixed, >>>> what it also meant that as our fork diverged from "regular" Calcite, it >>>> became harder and harder to maintain. It also meant that we were >> chasing >>>> bugs that had since been fixed. >>>>> >>>>> Drill is in the process of "de-forking" Calcite, meaning that we're >>>> ditching our fork and re-integrating with standard Calcite. It has >> been >>> A >>>> TON of work and we have contributed (and will continue to contribute) >> bug >>>> fixes and PRs to Calcite. In the long run, I think this will be >>> beneficial >>>> for both communities. >>>>> >>>>> Best, >>>>> -- C >>>>> >>>>> >>>>>> On Jun 21, 2022, at 1:57 PM, Julian Hyde <[email protected]> >>>> wrote: >>>>>> >>>>>> Please don’t fork Calcite. >>>>>> >>>>>> Calcite suffers from the tragedy of the commons. Unlike many open >>>> source data projects, there is no commercial project that directly maps >>> to >>>> Calcite (even though Calcite is an essential part of many projects). >> As a >>>> result no engineers work full-time on Calcite. >>>>>> >>>>>> It takes more than pull requests to keep a project going. We need >>>> reviewers, people to work on releases, people to fix bugs (such as >>> security >>>> bugs) that are important to everyone but urgent to no one. >>>>>> >>>>>> We have plenty of committers in Calcite, and add several more per >>> year. >>>> We rely on those committers taking on their share of the housework, but >>> the >>>> burden falls on too few people. >>>>>> >>>>>> Engineering managers need to start paying a little more for the >> “free >>>> lunch” that they enjoy when Calcite “just works” in their project. >> Sadly, >>>> most engineering managers are not subscribed to this list. >>>>>> >>>>>> Julian >>>>>> >>>>>> >>>>>>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <[email protected]> >>>> wrote: >>>>>>> >>>>>>> Martijn, thanks for sharing that thread in the Flink community. >>>>>>> >>>>>>> I'm someone who has forked Calcite twice: once in Apache Drill and >>>> again in >>>>>>> Dremio. In both cases, it was all about trading short term benefits >>>> against >>>>>>> long term costs. In both cases, I think the net amount of work was >>>> probably >>>>>>> 5x as much as what it would have been if we had just done a better >>> job >>>>>>> engaging the community. If I were to state the curve of behavior >> over >>>> six >>>>>>> years, I'd guess that in both cases the numbers of effort looked >> like >>>> this: >>>>>>> >>>>>>> estimated effort doing high intensity integration with calcite >> (years >>>> 1-6) >>>>>>> fork: 1, 5, 10, 50, 100, 200, total = 366 >>>>>>> non-fork: 10, 10, 10, 10, 10, total = 50 >>>>>>> >>>>>>> So yes, the first couple years you're ahead. But you pay a massive >>>>>>> technical debt premium long term. Early in a project (Drill) or >>>> company's >>>>>>> life (Dremio), it can make sense to sacrifice long term for short >>> term >>>> but >>>>>>> it's important people do it with their eyes open. >>>>>>> >>>>>>> The reason that this pain is so high is that as your codebases >>>> diverge, you >>>>>>> start having to do everything the Calcite community does by >> yourself. >>>>>>> Backports become harder and things that you need (e.g. new sql >>> syntax, >>>> etc) >>>>>>> have to be reimplemented (even if someone else already implemented >>>> them in >>>>>>> some post-fork Calcite version. Ultimately, at some point you >> realize >>>> that >>>>>>> your path is untenable and you unfork. This becomes the biggest >>>> expense of >>>>>>> them all and I believe both of those teams are still trying to >>>> un-fork. The >>>>>>> additional thing that becomes an even bigger problem is your >> absence >>>> from >>>>>>> the Calcite community means that people may take the project or >> APIs >>> in >>>>>>> ways that are in direct conflict to how you use the library. Since >>>> you're >>>>>>> not active in the project, you fail to provide a counterpoint and >>> then >>>>>>> you're basically just in a miserable place. The Hive project did >> this >>>> best >>>>>>> by ensuring that releases of Calcite were also run pre-release >>> against >>>> Hive >>>>>>> to make sure no major regressions occurred. By being in the >> community >>>> and >>>>>>> active, this is the best state from my pov. (It makes your project >>>> better >>>>>>> and Calcite better.) >>>>>>> >>>>>>> Two last notes: >>>>>>> - I'm not sure the rocks fork is comparable to forking Calcite. The >>> api >>>>>>> surface area and community models are very different. >>>>>>> - This is all based on a high intensity integration (using rules + >>>> planner >>>>>>> or sql + rules + planner). Calcite is frustratingly monolithic and >> if >>>>>>> someone was only going to use a small component, my opinion would >>>> likely be >>>>>>> very different. >>>>>>> >>>>>>> I'd send this to the Flink list but I'm not subscribed. It'd be >> great >>>> if >>>>>>> you shared it with the people over there if you think they'd find >> it >>>> useful. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser < >>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Julian and Austin! >>>>>>>> >>>>>>>> Any reply to kick-off some sort of discussion is worthwhile :D >>>>>>>> I definitely know the feeling of having more PRs open then you >> would >>>> like, >>>>>>>> looking at https://github.com/apache/flink/pulls :) >>>>>>>> >>>>>>>> There have been discussions in the Flink community about forking >>>> Calcite >>>>>>>> [1]. My personal preference at the moment is to see if we can >>> create a >>>>>>>> better collaboration and community. I believe that we can find >>> people >>>> from >>>>>>>> the Flink community who can open / help reviewing Calcite PRs that >>> are >>>>>>>> interesting for the Flink community. The question is if that will >>>> also help >>>>>>>> short term since in the end it still requires a Calcite maintainer >>> to >>>>>>>> review/merge. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Martijn >>>>>>>> >>>>>>>> [1] >>> https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4 >>>>>>>> >>>>>>>> >>>>>>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett < >>>>>>>> [email protected]>: >>>>>>>> >>>>>>>>> From the peanut gallery :-) --> >>>>>>>>> >>>>>>>>> Wow; yes, lots of open PRs. >>> https://github.com/apache/calcite/pulls >>>>>>>>> >>>>>>>>> How can individuals from the Flink [sub-]community, and/or more >>>> general >>>>>>>>> calcite community help lighten this load? Is there much weight >>>> given to >>>>>>>>> reviews from non-committers; how to increase the # of people >>> capable >>>> of >>>>>>>>> providing worthwhile reviews [ that are recognized as such ]? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde < >>> [email protected] >>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Martijn, >>>>>>>>>> >>>>>>>>>> Since you requested a reply, I am replying. To answer your >>>> question, I >>>>>>>>>> don’t know of a way to move this topic forward. We have more PRs >>>> than >>>>>>>>>> people to review them. >>>>>>>>>> >>>>>>>>>> Julian >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser < >>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I just wanted to reach out to the Calcite community once more >> on >>>> this >>>>>>>>>> topic >>>>>>>>>>> since no reply was received. Would be great if someone could >> get >>>> back >>>>>>>>> to >>>>>>>>>> us. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> >>>>>>>>>>> Martijn >>>>>>>>>>> >>>>>>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser < >>>>>>>>>> [email protected] >>>>>>>>>>>> : >>>>>>>>>>> >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> I would like to follow-up on this email that was sent by Jing. >>> So >>>>>>>> far, >>>>>>>>>> no >>>>>>>>>>>> progress has been made, despite reaching out to the mailing >>> list, >>>>>>>> the >>>>>>>>>>>> original Jira ticket and reaching out to people directly. Is >>>> there a >>>>>>>>> way >>>>>>>>>>>> that we can move this PR/topic forward? >>>>>>>>>>>> >>>>>>>>>>>> For context, in Apache Flink we're currently heavily using >>>> Calcite. >>>>>>>>>>>> However, we are now at the stage where Calcite is actually >>> holding >>>>>>>> us >>>>>>>>>> back. >>>>>>>>>>>> It would be great if we can find a way to strengthen our bond >>> and >>>>>>>> move >>>>>>>>>> both >>>>>>>>>>>> Calcite and Flink forward. >>>>>>>>>>>> >>>>>>>>>>>> Looking forward to your thoughts, >>>>>>>>>>>> >>>>>>>>>>>> Martijn >>>>>>>>>>>> >>>>>>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote: >>>>>>>>>>>>> Hi community, >>>>>>>>>>>>> My apologies for interrupting. >>>>>>>>>>>>> Anyone could help to review the pr >>>>>>>>>>>>> https://github.com/apache/calcite/pull/2606? >>>>>>>>>>>>> Thanks a lot. >>>>>>>>>>>>> >>>>>>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira >>>> aims >>>>>>>> to >>>>>>>>>>>>> extend existing Table function in order to support >> Polymorphic >>>>>>>> Table >>>>>>>>>>>>> Function which is introduced as the part of ANSI SQL 2016. >>>>>>>>>>>>> >>>>>>>>>>>>> The brief change logs of the PR are: >>>>>>>>>>>>> - Update `Parser.jj` to support partition by clause and order >>> by >>>>>>>>>> clause >>>>>>>>>>>>> for input table with set semantics of PTF >>>>>>>>>>>>> - Introduce `TableCharacteristics` which contains three >>>>>>>>>> characteristics >>>>>>>>>>>>> of input table of table function >>>>>>>>>>>>> - Update `SqlTableFunction` to add a method >>>>>>>> `tableCharacteristics`, >>>>>>>>>>>> the >>>>>>>>>>>>> method returns the table characteristics for the ordinal-th >>>>>>>> argument >>>>>>>>> to >>>>>>>>>>>>> this table function. Default return value is Optional.empty >>> which >>>>>>>>> means >>>>>>>>>>>> the >>>>>>>>>>>>> ordinal-th argument is not table. >>>>>>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input >> table >>>>>>>> with >>>>>>>>>>>> set >>>>>>>>>>>>> semantics of Table Function, its `SqlKind` is >>>> `SET_SEMANTICS_TABLE` >>>>>>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic >>> table >>>>>>>> of >>>>>>>>>>>> Table >>>>>>>>>>>>> Function could have partition by and order by clause >>>>>>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse >>> subQuery >>>>>>>>>> which >>>>>>>>>>>>> represents set semantics table. >>>>>>>>>>>>> >>>>>>>>>>>>> PR: https://github.com/apache/calcite/pull/2606 >>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865 >>>>>>>>>>>>> Parent JARA: >>> https://issues.apache.org/jira/browse/CALCITE-4864 >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jing Zhang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>
