Hi Flink community members! As many of you will have noticed, the Flink project activity has gone up again quite a bit. There are many more contributions, which is an absolutely great thing to have :-)
However, we see a continuously growing backlog of pull requests and JIRA issues. To make sure the community will be able to handle the increased volume, I think we need to revisit some approaches and processes. I believe there are a few opportunities to structure things a bit better, which should help to scale the development. The first thing I would like to bring up are *Pull Request Reviews*. Even though more community members being active in reviews (which is a really great thing!) the Pull Request backlog is increasing quite a bit. Why are pull requests still not merged faster? Looking at the reviews, one thing I noticed is that most reviews deal immediately with detailed code issues, and leave out most of the core questions that need to be answered before a Pull Request can be merged, like "is this a desired feature?" or "does this align well with other developments?". I think that we even make things slightly worse that way: From my personal experience, I have often thought "oh, this PR has a review already" and rather looked at another PR, only to find later that the first review did never decide whether this PR is actually a good fit for Flink. There has never been a proper documentation of how to answer these questions, what to evaluate in reviews, guidelines for how to evaluate pull requests, other than code quality. I suspect that this is why so many reviewers do not address the "is this a good contribution" questions, making pull requests linger until another committers joins the review. Below is an idea for a guide *"How to Review Contributions"*. It outlines five core aspects to be checked in every pull request, and suggests a priority for clarifying those. The idea is that this helps us to better structure reviews, and to make each reviewer aware what we look for in a review and where to best bring in their help. Looking forward to comments! Best, Stephan ==================================== The draft is in this Google Doc. Please add small textual comments to the doc, and bigger principle discussions as replies to this mail. https://docs.google.com/document/d/1yaX2b9LNh-6LxrAmE23U3D2cRbocGlGKCYnvJd9lVhk/edit?usp=sharing *How to Review Contributions------------------------------This guide is for all committers and contributors that want to help with reviewing contributions. Thank you for your effort - good reviews are one the most important and crucial parts of an open source project. This guide should help the community to make reviews such that: - Contributors have a good contribution experience- Reviews are structured and check all important aspects of a contribution- Make sure we keep a high code quality in Flink- We avoid situations where contributors and reviewers spend a lot of time to refine a contribution that gets rejected laterReview ChecklistEvery review needs to check the following five aspects. We encourage to check these aspects in order, to avoid spending time on detailed code quality reviews when there is not yet consensus that a feature or change should be actually be added.(1) Is there consensus whether the change of feature should go into to Flink?For bug fixes, this needs to be checked only in case it requires bigger changes or might break existing programs and setups.Ideally, this question is already answered from a JIRA issue or the dev-list discussion, except in cases of bug fixes and small lightweight additions/extensions. In that case, this question can be immediately marked as resolved. For pull requests that are created without prior consensus, this question needs to be answered as part of the review.The decision whether the change should go into Flink needs to take the following aspects into consideration: - Does the contribution alter the behavior of features or components in a way that it may break previous users’ programs and setups? If yes, there needs to be a discussion and agreement that this change is desirable. - Does the contribution conceptually fit well into Flink? Is it too much of special case such that it makes things more complicated for the common case, or bloats the abstractions / APIs? - Does the feature fit well into Flink’s architecture? Will it scale and keep Flink flexible for the future, or will the feature restrict Flink in the future? - Is the feature a significant new addition (rather than an improvement to an existing part)? If yes, will the Flink community commit to maintaining this feature? - Does the feature produce added value for Flink users or developers? Or does it introduce risk of regression without adding relevant user or developer benefit?All of these questions should be answerable from the description/discussion in JIRA and Pull Request, without looking at the code.(2) Does the contribution need attention from some specific committers and is there time commitment from these committers?Some changes require attention and approval from specific committers. For example, changes in parts that are either very performance sensitive, or have a critical impact on distributed coordination and fault tolerance need input by a committer that is deeply familiar with the component.As a rule of thumb, this is the case when the Pull Request description answers one of the questions in the template section “Does this pull request potentially affect one of the following parts” with ‘yes’.This question can be answered with - Does not need specific attention- Needs specific attention for X (X can be for example checkpointing, jobmanager, etc.).- Has specific attention for X by @commiterA, @contributorBIf the pull request needs specific attention, one of the tagged committers/contributors should give the final approval.(3) Is the contribution described well?Check whether the contribution is sufficiently well described to support a good review. Trivial changes and fixes do not need a long description. Any pull request that changes functionality or behavior needs to describe the big picture of these changes, so that reviews know what to look for (and don’t have to dig through the code to hopefully understand what the change does).Changes that require longer descriptions are ideally based on a prior design discussion in the mailing list or in JIRA and can simply link to there or copy the description from there.(4) Does the implementation follow the right overall approach/architecture?Is this the best approach to implement the fix or feature, or are there other approaches that would be easier, more robust, or more maintainable?This question should be answerable from the Pull Request description (or the linked JIRA) as much as possible.We recommend to check this before diving into the details of commenting on individual parts of the change.(5) Is the overall code quality good, meeting standard we want to maintain in Flink?This is the detailed code review of the actual changes, covering: - Are the changes doing what is described in the design document or PR description?- Does the code follow the right software engineering practices? It the code correct, robust, maintainable, testable?- Are the change performance aware, when changing a performance sensitive part?- Are the changes sufficiently covered by tests?- Are the tests executing fast?- Does the code format follow Flink’s checkstyle pattern?- Does the code avoid to introduce additional compiler warnings?Some code style guidelines can be found in the [Flink Code Style Page](https://flink.apache.org/contribute-code.html#code-style <https://flink.apache.org/contribute-code.html#code-style>)Pull Request Review TemplateAdd the following checklist to the pull request review, checking the boxes as the questions are answered: - [ ] Consensus that the contribution should go into to Flink - [ ] Does not need specific attention | Needs specific attention for X | Has attention for X by Y - [ ] Contribution description - [ ] Architectural approach - [ ] Overall code quality*