Re: PR Review Request

Charles Givre Thu, 23 Jun 2022 04:56:02 -0700

Hello all, 
FWIW, If a committer/reviewer shortage is the issue, I'd second Stamatis's 
recommendation.
Best,
-- C


> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis <[email protected]> wrote:
> 
> Hi all,
> 
> How about granting Calcite committership to people who are already ASF
> committers (in other projects) and they have a proven record of working
> with Calcite?
> 
> Usually the PMC invites people to become committers to the project after
> having a few successful code contributions in Calcite/Avatica repos.
> This is to ensure that people are familiar with the codebase and understand
> how the ASF works.
> 
> People who are already committers in an ASF project already know how the
> foundation works and how they should behave.
> Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix,
> etc., may already be quite familiar with Calcite if they have worked on the
> query processing layer of the system.
> 
> It might be difficult for the Calcite PMC to identify people familiar with
> Calcite if they don't contribute to the main Calcite/Avatica repos
> regularly thus I would be open to consider people for committers on a per
> request basis.
> 
> Example:
> Bob is an ASF committer in Flink and he has pushed various contributions
> around Calcite in the Flink repo.
> Bob feels confident about fixing trivial things in Calcite and he wants to
> help with reviewing and merging open PRs.
> Bob sends an email to private@calcite list requesting to become a Calcite
> committer.
> Bob explains in the email who he is and what he has done to demonstrate he
> is familiar with the Calcite code.
> The Calcite PMC acknowledges the request and starts a vote for granting
> Calcite comittership to Bob.
> The Calcite PMC informs Bob about their decision and takes further actions
> if necessary.
> 
> If we agree on the overall idea we can figure out the details and formalize
> the request process in our docs.
> 
> Best,
> Stamatis
> 
> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang <[email protected]> wrote:
> 
>> Hi everyone,
>> 
>> This is an awesome discussion to improve collaborating between different
>> projects.
>> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.
>> 
>> Best,
>> Jing Zhang
>> 
>> Martijn Visser <[email protected]> 于2022年6月23日周四 01:43写道：
>> 
>>> Hi Jacques, Julian, Austin and everyone else,
>>> 
>>> Thank you very much for sharing all your experiences and providing really
>>> valuable input. I'll definitely relay this back to the original
>> discussion
>>> thread in the Flink community. Part of bringing this information back to
>>> the Flink community is also because I feel like the only way that
>> different
>>> OSS solutions can help each other forward is by communicating and
>>> collaborating. As Timo already mentioned, he'll try to help out. Let's
>> try
>>> to get some more involved.
>>> 
>>> Side note: I also saw that this thread got some traction on Twitter [1]
>> on
>>> the cost of forking.
>>> 
>>> Best regards,
>>> 
>>> Martijn
>>> 
>>> [1]
>>> 
>>> 
>> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA
>>> 
>>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <[email protected]>:
>>> 
>>>> Hi everyone,
>>>> 
>>>> This is a really great discussion. Thanks for starting it Martijn and
>>>> your input Jacques! I have been fighting against forking Calcite in
>>>> Flink for years already. Even when merging forks of Flink that
>>>> transitively forked Calcite, in the end we were able to resolve
>>>> conflicts / contribute blockers back into Calcite. And I strongly
>>>> believe that this is the better approach for long-term success for both
>>>> projects.
>>>> 
>>>> I would like to get more involved in the Calcite community. I have been
>>>> implementing and managing Flink SQL based on Calcite since 2016. Thus,
>> I
>>>> feel confident to say that I know the code base and some quirks in the
>>>> stack very well.
>>>> 
>>>> Capacity-wise I will try to reserve some time for helping the Calcite
>>>> community. Happy to get some pointers where and how I can help.
>>>> 
>>>> I will take a look at https://github.com/apache/calcite/pull/2606 this
>>>> week to get the ball rolling. As this is an important addition and
>>>> prepares for "customer SQL operators" in Flink SQL.
>>>> 
>>>> Regards,
>>>> Timo
>>>> 
>>>> On 21.06.22 22:18, Charles Givre wrote:
>>>>> As the PMC for Apache Drill, I'd echo everyone's comments here....
>>> Don't
>>>> fork.   Don't do it.
>>>>> 
>>>>> Apache Drill forked Calcite several years ago which Calcite was on
>>>> version 1.20 or 1.21.  While this meant that some bugs were easily
>> fixed,
>>>> what it also meant that as our fork diverged from "regular" Calcite, it
>>>> became harder and harder to maintain.  It also meant that we were
>> chasing
>>>> bugs that had since been fixed.
>>>>> 
>>>>> Drill is in the process of "de-forking" Calcite, meaning that we're
>>>> ditching our fork and re-integrating with standard Calcite.  It has
>> been
>>> A
>>>> TON of work and we have contributed (and will continue to contribute)
>> bug
>>>> fixes and PRs to Calcite. In the long run, I think this will be
>>> beneficial
>>>> for both communities.
>>>>> 
>>>>> Best,
>>>>> -- C
>>>>> 
>>>>> 
>>>>>> On Jun 21, 2022, at 1:57 PM, Julian Hyde <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Please don’t fork Calcite.
>>>>>> 
>>>>>> Calcite suffers from the tragedy of the commons. Unlike many open
>>>> source data projects, there is no commercial project that directly maps
>>> to
>>>> Calcite (even though Calcite is an essential part of many projects).
>> As a
>>>> result no engineers work full-time on Calcite.
>>>>>> 
>>>>>> It takes more than pull requests to keep a project going. We need
>>>> reviewers, people to work on releases, people to fix bugs (such as
>>> security
>>>> bugs) that are important to everyone but urgent to no one.
>>>>>> 
>>>>>> We have plenty of committers in Calcite, and add several more per
>>> year.
>>>> We rely on those committers taking on their share of the housework, but
>>> the
>>>> burden falls on too few people.
>>>>>> 
>>>>>> Engineering managers need to start paying a little more for the
>> “free
>>>> lunch” that they enjoy when Calcite “just works” in their project.
>> Sadly,
>>>> most engineering managers are not subscribed to this list.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>> Martijn, thanks for sharing that thread in the Flink community.
>>>>>>> 
>>>>>>> I'm someone who has forked Calcite twice: once in Apache Drill and
>>>> again in
>>>>>>> Dremio. In both cases, it was all about trading short term benefits
>>>> against
>>>>>>> long term costs. In both cases, I think the net amount of work was
>>>> probably
>>>>>>> 5x as much as what it would have been if we had just done a better
>>> job
>>>>>>> engaging the community. If I were to state the curve of behavior
>> over
>>>> six
>>>>>>> years, I'd guess that in both cases the numbers of effort looked
>> like
>>>> this:
>>>>>>> 
>>>>>>> estimated effort doing high intensity integration with calcite
>> (years
>>>> 1-6)
>>>>>>> fork: 1, 5, 10, 50, 100, 200, total = 366
>>>>>>> non-fork: 10, 10, 10, 10, 10, total = 50
>>>>>>> 
>>>>>>> So yes, the first couple years you're ahead. But you pay a massive
>>>>>>> technical debt premium long term. Early in a project (Drill) or
>>>> company's
>>>>>>> life (Dremio), it can make sense to sacrifice long term for short
>>> term
>>>> but
>>>>>>> it's important people do it with their eyes open.
>>>>>>> 
>>>>>>> The reason that this pain is so high is that as your codebases
>>>> diverge, you
>>>>>>> start having to do everything the Calcite community does by
>> yourself.
>>>>>>> Backports become harder and things that you need (e.g. new sql
>>> syntax,
>>>> etc)
>>>>>>> have to be reimplemented (even if someone else already implemented
>>>> them in
>>>>>>> some post-fork Calcite version. Ultimately, at some point you
>> realize
>>>> that
>>>>>>> your path is untenable and you unfork. This becomes the biggest
>>>> expense of
>>>>>>> them all and I believe both of those teams are still trying to
>>>> un-fork. The
>>>>>>> additional thing that becomes an even bigger problem is your
>> absence
>>>> from
>>>>>>> the Calcite community means that people may take the project or
>> APIs
>>> in
>>>>>>> ways that are in direct conflict to how you use the library. Since
>>>> you're
>>>>>>> not active in the project, you fail to provide a counterpoint and
>>> then
>>>>>>> you're basically just in a miserable place. The Hive project did
>> this
>>>> best
>>>>>>> by ensuring that releases of Calcite were also run pre-release
>>> against
>>>> Hive
>>>>>>> to make sure no major regressions occurred. By being in the
>> community
>>>> and
>>>>>>> active, this is the best state from my pov. (It makes your project
>>>> better
>>>>>>> and Calcite better.)
>>>>>>> 
>>>>>>> Two last notes:
>>>>>>> - I'm not sure the rocks fork is comparable to forking Calcite. The
>>> api
>>>>>>> surface area and community models are very different.
>>>>>>> - This is all based on a high intensity integration (using rules +
>>>> planner
>>>>>>> or sql + rules + planner). Calcite is frustratingly monolithic and
>> if
>>>>>>> someone was only going to use a small component, my opinion would
>>>> likely be
>>>>>>> very different.
>>>>>>> 
>>>>>>> I'd send this to the Flink list but I'm not subscribed. It'd be
>> great
>>>> if
>>>>>>> you shared it with the people over there if you think they'd find
>> it
>>>> useful.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Julian and Austin!
>>>>>>>> 
>>>>>>>> Any reply to kick-off some sort of discussion is worthwhile :D
>>>>>>>> I definitely know the feeling of having more PRs open then you
>> would
>>>> like,
>>>>>>>> looking at https://github.com/apache/flink/pulls :)
>>>>>>>> 
>>>>>>>> There have been discussions in the Flink community about forking
>>>> Calcite
>>>>>>>> [1]. My personal preference at the moment is to see if we can
>>> create a
>>>>>>>> better collaboration and community. I believe that we can find
>>> people
>>>> from
>>>>>>>> the Flink community who can open / help reviewing Calcite PRs that
>>> are
>>>>>>>> interesting for the Flink community. The question is if that will
>>>> also help
>>>>>>>> short term since in the end it still requires a Calcite maintainer
>>> to
>>>>>>>> review/merge.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> 
>>>>>>>> Martijn
>>>>>>>> 
>>>>>>>> [1]
>>> https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
>>>>>>>> [email protected]>:
>>>>>>>> 
>>>>>>>>> From the peanut gallery :-)  -->
>>>>>>>>> 
>>>>>>>>> Wow; yes, lots of open PRs.
>>> https://github.com/apache/calcite/pulls
>>>>>>>>> 
>>>>>>>>> How can individuals from the Flink [sub-]community, and/or more
>>>> general
>>>>>>>>> calcite community help lighten this load?  Is there much weight
>>>> given to
>>>>>>>>> reviews from non-committers; how to increase the # of people
>>> capable
>>>> of
>>>>>>>>> providing worthwhile reviews [ that are recognized as such ]?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <
>>> [email protected]
>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Martijn,
>>>>>>>>>> 
>>>>>>>>>> Since you requested a reply, I am replying. To answer your
>>>> question, I
>>>>>>>>>> don’t know of a way to move this topic forward. We have more PRs
>>>> than
>>>>>>>>>> people to review them.
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser <
>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> 
>>>>>>>>>>> I just wanted to reach out to the Calcite community once more
>> on
>>>> this
>>>>>>>>>> topic
>>>>>>>>>>> since no reply was received. Would be great if someone could
>> get
>>>> back
>>>>>>>>> to
>>>>>>>>>> us.
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> 
>>>>>>>>>>> Martijn
>>>>>>>>>>> 
>>>>>>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to follow-up on this email that was sent by Jing.
>>> So
>>>>>>>> far,
>>>>>>>>>> no
>>>>>>>>>>>> progress has been made, despite reaching out to the mailing
>>> list,
>>>>>>>> the
>>>>>>>>>>>> original Jira ticket and reaching out to people directly. Is
>>>> there a
>>>>>>>>> way
>>>>>>>>>>>> that we can move this PR/topic forward?
>>>>>>>>>>>> 
>>>>>>>>>>>> For context, in Apache Flink we're currently heavily using
>>>> Calcite.
>>>>>>>>>>>> However, we are now at the stage where Calcite is actually
>>> holding
>>>>>>>> us
>>>>>>>>>> back.
>>>>>>>>>>>> It would be great if we can find a way to strengthen our bond
>>> and
>>>>>>>> move
>>>>>>>>>> both
>>>>>>>>>>>> Calcite and Flink forward.
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking forward to your thoughts,
>>>>>>>>>>>> 
>>>>>>>>>>>> Martijn
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote:
>>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>> My apologies for interrupting.
>>>>>>>>>>>>> Anyone could help to review the pr
>>>>>>>>>>>>> https://github.com/apache/calcite/pull/2606?
>>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira
>>>> aims
>>>>>>>> to
>>>>>>>>>>>>> extend existing Table function in order to support
>> Polymorphic
>>>>>>>> Table
>>>>>>>>>>>>> Function which is introduced as the part of ANSI SQL 2016.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The brief change logs of the PR are:
>>>>>>>>>>>>> - Update `Parser.jj` to support partition by clause and order
>>> by
>>>>>>>>>> clause
>>>>>>>>>>>>> for input table with set semantics of PTF
>>>>>>>>>>>>> - Introduce `TableCharacteristics` which contains three
>>>>>>>>>> characteristics
>>>>>>>>>>>>> of input table of table function
>>>>>>>>>>>>> - Update `SqlTableFunction` to add a method
>>>>>>>> `tableCharacteristics`,
>>>>>>>>>>>> the
>>>>>>>>>>>>> method returns the table characteristics for the ordinal-th
>>>>>>>> argument
>>>>>>>>> to
>>>>>>>>>>>>> this table function. Default return value is Optional.empty
>>> which
>>>>>>>>> means
>>>>>>>>>>>> the
>>>>>>>>>>>>> ordinal-th argument is not table.
>>>>>>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input
>> table
>>>>>>>> with
>>>>>>>>>>>> set
>>>>>>>>>>>>> semantics of Table Function, its `SqlKind` is
>>>> `SET_SEMANTICS_TABLE`
>>>>>>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic
>>> table
>>>>>>>> of
>>>>>>>>>>>> Table
>>>>>>>>>>>>> Function could have partition by and order by clause
>>>>>>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse
>>> subQuery
>>>>>>>>>> which
>>>>>>>>>>>>> represents set semantics table.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> PR: https://github.com/apache/calcite/pull/2606
>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
>>>>>>>>>>>>> Parent JARA:
>>> https://issues.apache.org/jira/browse/CALCITE-4864
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jing Zhang
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>

Re: PR Review Request

Reply via email to