Hi Piotr, I strongly agree with your idea. Since we work on a huge project, diff in the PR becomes hard to review if there is too much noise.
Just to clarify the process. Do you mean that a pull request should go into the way below? Separated commits during the implement, mainly distinguish feature/bug fix with clean up/rework. [FLINK-XXX] [module] future/bug fix... [FLINK-XXX] [module] more on future/bug fix... [hotfix] clean up/rework... and so on. And finally, when get merged, put all stuff into one commit and comments close #PRID so coming ones could see the detail by jump to #PRID. One thing to trade off is that if we merge by one commit, we cannot revert part of it automated; if we merge by PR commits(one by one), the commit log might mess. Best, tison. Piotr Nowojski <pi...@data-artisans.com> 于2018年9月19日周三 下午5:10写道: > Hi, > > I would like to rise one more issue. Often contributions are very heavy > and difficult to review. They have one big commit that changes multiple > things which is difficult to review. Instead I would be in favour of > implementing a rule, that a single commit should never do more then one > thing. For example never mixing refactoring/renames/clean ups/code > deduplications with functional changes. If implementing something requires > two independent changes in two separate components, those also should be > two independent commits. In other words, if there are two changed lines in > a single commit, they should interact somehow together and strictly depend > on one another. This has couple of important advantages: > > 1. Obviously faster reviews. Especially if a reviewer do not have to find > 2 lines bug fix among 200 lines of renames/refactoring. > 2. Provides good “cut out” points for reviewer. For example he can easily > interrupt reviewing in the middle and continue later or even merge PR in > stages. > 3. Better reference “why something was done this way not the other” for > the future. This is the same argument as first point, however with benefit > not during reviewing, but when after merging someone is trying to > understand the code. > 4. Commit message becomes much better place to write down reasons why > something was done and what are the effects (not that this should replace > comments/documentation, only to complement it). > 5. In case of need to revert/drop some part of the contribution, we are > not loosing all of it. If we have to revert some small feature, it would be > easier to keep refactoring and clean ups. > > > Some examples of PRs that were more or less following this rule: > https://github.com/apache/flink/pull/6692/commits > https://github.com/apache/flink/pull/5423/commits < > https://github.com/apache/flink/pull/5423/commits> (a bit extreme) > > If someone is not convinced I encourage to open those PRs and browse > through couple of first commits (which are refactoring/clean up commits) > one by one (GitHub has next/prev commit button). Then imagine if they were > squashed with some functional/performance improvement changes. > > Piotrek > > > On 18 Sep 2018, at 17:12, 陈梓立 <wander4...@gmail.com> wrote: > > > > Put some good cases here might be helpful. > > > > See how a contribution of runtime module be proposed, discussed, > > implemented and merged from https://github.com/apache/flink/pull/5931 > to > > https://github.com/apache/flink/pull/6132. > > > > 1. #5931 fix a bug, but remains points could be improved. Here sihuazhou > > and shuai-xu share their considerations and require review(of the > proposal) > > by Stephan, Till and Gary, our committers. > > 2. After discussion, all people involved reach a consensus. So the rest > > work is to implement it. > > 3. sihuazhou gives out an implementation #6132, Till reviews it and find > it > > is somewhat out of the "architectural" aspect, so suggests > > implementation-level changes. > > 4. Addressing those implementation-level comments, the PR gets merged. > > > > I think this is quite a good example how we think our review process > should > > go. > > > > Best, > > tison. > > > > > > 陈梓立 <wander4...@gmail.com> 于2018年9月18日周二 下午10:53写道: > > > >> Maybe a little rearrange to the process would help. > >> > >> (1). Does the contributor describe itself well? > >> (1.1) By whom this contribution should be given attention. This often > >> shows by its title, "[FLINK-XXX] [module]", the module part infer. > >> (1.2) What the purpose of this contribution is. Done by the PR > template. > >> Even on JIRA an issue should cover these points. > >> > >> (2). Is there consensus on the contribution? > >> This follows (1), because we need to clear what the purpose of the > >> contribution first. At this stage reviewers could cc to module > maintainer > >> as a supplement to (1.1). Also reviewers might ask the contributor to > >> clarify his purpose to sharp(1.2) > >> > >> (3). Is the implement architectural and fit code style? > >> This follows (2). And only after a consensus we talk about concrete > >> implement, which prevent spend time and put effort in vain. > >> > >> In addition, ideally a "+1" comment or approval means the purpose of > >> contribution is supported by the reviewer and implement(if there is) > >> quality is fine, so the reviewer vote for a consensus. > >> > >> Best, > >> tison. > >> > >> > >> Stephan Ewen <se...@apache.org> 于2018年9月18日周二 下午6:44写道: > >> > >>> On the template discussion, some thoughts > >>> > >>> *PR Template* > >>> > >>> I think the PR template went well. We can rethink the "checklist" at > the > >>> bottom, but all other parts turned out helpful in my opinion. > >>> > >>> With the amount of contributions, it helps to ask the contributor to > take > >>> a > >>> little more work in order for the reviewer to be more efficient. > >>> I would suggest to keep that mindset: Whenever we find a way that the > >>> contributor can prepare stuff in such a way that reviews become > >>> more efficient, we should do that. In my experience, most contributors > are > >>> willing to put in some extra minutes if it helps that their > >>> PR gets merged faster. > >>> > >>> *Review Template* > >>> > >>> I think it would be helpful to have this checklist. It does not matter > in > >>> which form, be that as a text template, be that as labels. > >>> > >>> The most important thing is to make explicit which questions have been > >>> answered in the review. > >>> Currently there is a lot of "+1" on pull requests which means "code > >>> quality > >>> is fine", but all other questions are unanswered. > >>> The contributors then rightfully wonder why this does not get merged. > >>> > >>> > >>> > >>> On Tue, Sep 18, 2018 at 7:26 AM, 陈梓立 <wander4...@gmail.com> wrote: > >>> > >>>> Hi all interested, > >>>> > >>>> Within the document there is a heated discussion about how the PR > >>>> template/review template should be. > >>>> > >>>> Here share my opinion: > >>>> > >>>> 1. For the review template, actually we don't need comment a review > >>>> template at all. GitHub has a tag system and only committer could add > >>> tags, > >>>> which we can make use of it. That is, tagging this PR is > >>>> waiting-for-proposal-approved, waiting-for-code-review, > >>>> waiting-for-benchmark or block-by-author and so on. Asfbot could pick > >>>> GitHub tag state to the corresponding JIRA and we always regard JIRA > as > >>> the > >>>> main discussion borad. > >>>> > >>>> 2. For the PR template, the greeting message is redundant. Just > >>> emphasize a > >>>> JIRA associated is important and how to format the title is enough. > >>>> Besides, the "Does this pull request potentially affect one of the > >>>> following parts" part and "Documentation" should be coved from "What > is > >>> the > >>>> purpose of the change" and "Brief change log". These two parts, users > >>>> always answer no and would be aware if they really make changes on it. > >>> As > >>>> example, even pull request requires document, its owner might no add > it > >>> at > >>>> first. The PR template is a guide but not which one have to learn. > >>>> > >>>> To sum up, (1) take advantage of GitHub's tag system to tag review > >>> progress > >>>> (2) make the template more concise to avoid burden mature contributors > >>> and > >>>> force new comer to learn too much. > >>>> > >>>> Best, > >>>> tison. > >>>> > >>>> > >>>> Rong Rong <walter...@gmail.com> 于2018年9月18日周二 上午7:05写道: > >>>> > >>>>> Thanks for putting the review contribution doc together, Stephan! > This > >>>> will > >>>>> definitely help the community to make the review process better. > >>>>> > >>>>> From my experience this will benefit on both contributors and > >>> reviewers > >>>>> side! Thus +1 for putting into practice as well. > >>>>> > >>>>> -- > >>>>> Rong > >>>>> > >>>>> On Mon, Sep 17, 2018 at 10:18 AM Stephan Ewen <se...@apache.org> > >>> wrote: > >>>>> > >>>>>> Hi! > >>>>>> > >>>>>> Thanks you for the encouraging feedback so far. > >>>>>> > >>>>>> The overall goal is definitely to make the contribution process > >>> better > >>>>> and > >>>>>> get fewer pull requests that are disregarded. > >>>>>> > >>>>>> There are various reasons for the disregarded pull requests, one > >>> being > >>>>> that > >>>>>> fewer committers really participate in reviews beyond > >>>>>> the component they are currently very involved with. This is a > >>> separate > >>>>>> issue and I am thinking on how to encourage more > >>>>>> activity there. > >>>>>> > >>>>>> The other reason I was lack of structure and lack of decision > >>> making, > >>>>> which > >>>>>> is what I am first trying to fix here. > >>>>>> A follow-up to this will definitely be to improve the contribution > >>>> guide > >>>>> as > >>>>>> well. > >>>>>> > >>>>>> Best, > >>>>>> Stephan > >>>>>> > >>>>>> > >>>>>> On Mon, Sep 17, 2018 at 12:05 PM, Zhijiang(wangzhijiang999) < > >>>>>> wangzhijiang...@aliyun.com.invalid> wrote: > >>>>>> > >>>>>>> From my personal experience as a contributor for three years, I > >>> feel > >>>>>>> better experience in contirbuting or reviewing than before, > >>> although > >>>> we > >>>>>>> still have some points for further progress. > >>>>>>> > >>>>>>> I reviewed the proposal doc, and it gives very constructive and > >>>>>> meaningful > >>>>>>> guides which could help both contributor and reviewer. I agree > >>> with > >>>> the > >>>>>>> bove suggestions and wish they can be praticed well! > >>>>>>> > >>>>>>> Best, > >>>>>>> Zhijiang > >>>>>>> ------------------------------------------------------------------ > >>>>>>> 发件人:Till Rohrmann <trohrm...@apache.org> > >>>>>>> 发送时间:2018年9月17日(星期一) 16:27 > >>>>>>> 收件人:dev <dev@flink.apache.org> > >>>>>>> 主 题:Re: [PROPOSAL] [community] A more structured approach to > >>> reviews > >>>>> and > >>>>>>> contributions > >>>>>>> > >>>>>>> Thanks for writing this up Stephan. I like the steps and hope > >>> that it > >>>>>> will > >>>>>>> help the community to make the review process better. Thus, +1 for > >>>>>> putting > >>>>>>> your proposal to practice. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Till > >>>>>>> > >>>>>>> On Mon, Sep 17, 2018 at 10:00 AM Stephan Ewen <se...@apache.org> > >>>>> wrote: > >>>>>>> > >>>>>>>> Hi Flink community members! > >>>>>>>> > >>>>>>>> As many of you will have noticed, the Flink project activity has > >>>> gone > >>>>>> up > >>>>>>>> again quite a bit. > >>>>>>>> There are many more contributions, which is an absolutely great > >>>> thing > >>>>>> to > >>>>>>>> have :-) > >>>>>>>> > >>>>>>>> However, we see a continuously growing backlog of pull requests > >>> and > >>>>>> JIRA > >>>>>>>> issues. > >>>>>>>> To make sure the community will be able to handle the increased > >>>>>> volume, I > >>>>>>>> think we need to revisit some > >>>>>>>> approaches and processes. I believe there are a few > >>> opportunities > >>>> to > >>>>>>>> structure things a bit better, which > >>>>>>>> should help to scale the development. > >>>>>>>> > >>>>>>>> The first thing I would like to bring up are *Pull Request > >>>> Reviews*. > >>>>>> Even > >>>>>>>> though more community members being > >>>>>>>> active in reviews (which is a really great thing!) the Pull > >>> Request > >>>>>>> backlog > >>>>>>>> is increasing quite a bit. > >>>>>>>> > >>>>>>>> Why are pull requests still not merged faster? Looking at the > >>>>> reviews, > >>>>>>> one > >>>>>>>> thing I noticed is that most reviews deal > >>>>>>>> immediately with detailed code issues, and leave out most of the > >>>> core > >>>>>>>> questions that need to be answered > >>>>>>>> before a Pull Request can be merged, like "is this a desired > >>>>> feature?" > >>>>>> or > >>>>>>>> "does this align well with other developments?". > >>>>>>>> I think that we even make things slightly worse that way: From > >>> my > >>>>>>> personal > >>>>>>>> experience, I have often thought "oh, this > >>>>>>>> PR has a review already" and rather looked at another PR, only > >>> to > >>>>> find > >>>>>>>> later that the first review did never decide whether > >>>>>>>> this PR is actually a good fit for Flink. > >>>>>>>> > >>>>>>>> There has never been a proper documentation of how to answer > >>> these > >>>>>>>> questions, what to evaluate in reviews, > >>>>>>>> guidelines for how to evaluate pull requests, other than code > >>>>> quality. > >>>>>> I > >>>>>>>> suspect that this is why so many reviewers > >>>>>>>> do not address the "is this a good contribution" questions, > >>> making > >>>>> pull > >>>>>>>> requests linger until another committers joins > >>>>>>>> the review. > >>>>>>>> > >>>>>>>> Below is an idea for a guide *"How to Review Contributions"*. It > >>>>>> outlines > >>>>>>>> five core aspects to be checked in every > >>>>>>>> pull request, and suggests a priority for clarifying those. The > >>>> idea > >>>>> is > >>>>>>>> that this helps us to better structure reviews, and > >>>>>>>> to make each reviewer aware what we look for in a review and > >>> where > >>>> to > >>>>>>> best > >>>>>>>> bring in their help. > >>>>>>>> > >>>>>>>> Looking forward to comments! > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Stephan > >>>>>>>> > >>>>>>>> ==================================== > >>>>>>>> > >>>>>>>> The draft is in this Google Doc. Please add small textual > >>> comments > >>>> to > >>>>>> the > >>>>>>>> doc, and bigger principle discussions as replies to this mail. > >>>>>>>> > >>>>>>>> > >>>>>>>> https://docs.google.com/document/d/1yaX2b9LNh-6LxrAmE23U3D2c > >>>>>>> RbocGlGKCYnvJd9lVhk/edit?usp=sharing > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> *How to Review Contributions------------------------------This > >>>> guide > >>>>> is > >>>>>>> for > >>>>>>>> all committers and contributors that want to help with reviewing > >>>>>>>> contributions. Thank you for your effort - good reviews are one > >>> the > >>>>>> most > >>>>>>>> important and crucial parts of an open source project. This > >>> guide > >>>>>> should > >>>>>>>> help the community to make reviews such that: - Contributors > >>> have a > >>>>>> good > >>>>>>>> contribution experience- Reviews are structured and check all > >>>>> important > >>>>>>>> aspects of a contribution- Make sure we keep a high code > >>> quality in > >>>>>>> Flink- > >>>>>>>> We avoid situations where contributors and reviewers spend a > >>> lot of > >>>>>> time > >>>>>>> to > >>>>>>>> refine a contribution that gets rejected laterReview > >>> ChecklistEvery > >>>>>>> review > >>>>>>>> needs to check the following five aspects. We encourage to check > >>>>> these > >>>>>>>> aspects in order, to avoid spending time on detailed code > >>> quality > >>>>>> reviews > >>>>>>>> when there is not yet consensus that a feature or change should > >>> be > >>>>>>> actually > >>>>>>>> be added.(1) Is there consensus whether the change of feature > >>>> should > >>>>> go > >>>>>>>> into to Flink?For bug fixes, this needs to be checked only in > >>> case > >>>> it > >>>>>>>> requires bigger changes or might break existing programs and > >>>>>>>> setups.Ideally, this question is already answered from a JIRA > >>> issue > >>>>> or > >>>>>>> the > >>>>>>>> dev-list discussion, except in cases of bug fixes and small > >>>>> lightweight > >>>>>>>> additions/extensions. In that case, this question can be > >>>> immediately > >>>>>>> marked > >>>>>>>> as resolved. For pull requests that are created without prior > >>>>>> consensus, > >>>>>>>> this question needs to be answered as part of the review.The > >>>> decision > >>>>>>>> whether the change should go into Flink needs to take the > >>> following > >>>>>>> aspects > >>>>>>>> into consideration: - Does the contribution alter the behavior > >>> of > >>>>>>> features > >>>>>>>> or components in a way that it may break previous users’ > >>> programs > >>>> and > >>>>>>>> setups? If yes, there needs to be a discussion and agreement > >>> that > >>>>> this > >>>>>>>> change is desirable. - Does the contribution conceptually fit > >>> well > >>>>> into > >>>>>>>> Flink? Is it too much of special case such that it makes things > >>>> more > >>>>>>>> complicated for the common case, or bloats the abstractions / > >>>> APIs? - > >>>>>>> Does > >>>>>>>> the feature fit well into Flink’s architecture? Will it scale > >>> and > >>>>> keep > >>>>>>>> Flink flexible for the future, or will the feature restrict > >>> Flink > >>>> in > >>>>>> the > >>>>>>>> future? - Is the feature a significant new addition (rather > >>> than an > >>>>>>>> improvement to an existing part)? If yes, will the Flink > >>> community > >>>>>> commit > >>>>>>>> to maintaining this feature? - Does the feature produce added > >>> value > >>>>> for > >>>>>>>> Flink users or developers? Or does it introduce risk of > >>> regression > >>>>>>> without > >>>>>>>> adding relevant user or developer benefit?All of these questions > >>>>> should > >>>>>>> be > >>>>>>>> answerable from the description/discussion in JIRA and Pull > >>>> Request, > >>>>>>>> without looking at the code.(2) Does the contribution need > >>>> attention > >>>>>> from > >>>>>>>> some specific committers and is there time commitment from these > >>>>>>>> committers?Some changes require attention and approval from > >>>> specific > >>>>>>>> committers. For example, changes in parts that are either very > >>>>>>> performance > >>>>>>>> sensitive, or have a critical impact on distributed coordination > >>>> and > >>>>>>> fault > >>>>>>>> tolerance need input by a committer that is deeply familiar with > >>>> the > >>>>>>>> component.As a rule of thumb, this is the case when the Pull > >>>> Request > >>>>>>>> description answers one of the questions in the template section > >>>>> “Does > >>>>>>> this > >>>>>>>> pull request potentially affect one of the following parts” with > >>>>>>> ‘yes’.This > >>>>>>>> question can be answered with - Does not need specific > >>> attention- > >>>>> Needs > >>>>>>>> specific attention for X (X can be for example checkpointing, > >>>>>> jobmanager, > >>>>>>>> etc.).- Has specific attention for X by @commiterA, > >>> @contributorBIf > >>>>> the > >>>>>>>> pull request needs specific attention, one of the tagged > >>>>>>>> committers/contributors should give the final approval.(3) Is > >>> the > >>>>>>>> contribution described well?Check whether the contribution is > >>>>>>> sufficiently > >>>>>>>> well described to support a good review. Trivial changes and > >>> fixes > >>>> do > >>>>>> not > >>>>>>>> need a long description. Any pull request that changes > >>>> functionality > >>>>> or > >>>>>>>> behavior needs to describe the big picture of these changes, so > >>>> that > >>>>>>>> reviews know what to look for (and don’t have to dig through the > >>>> code > >>>>>> to > >>>>>>>> hopefully understand what the change does).Changes that require > >>>>> longer > >>>>>>>> descriptions are ideally based on a prior design discussion in > >>> the > >>>>>>> mailing > >>>>>>>> list or in JIRA and can simply link to there or copy the > >>>> description > >>>>>> from > >>>>>>>> there.(4) Does the implementation follow the right overall > >>>>>>>> approach/architecture?Is this the best approach to implement the > >>>> fix > >>>>> or > >>>>>>>> feature, or are there other approaches that would be easier, > >>> more > >>>>>> robust, > >>>>>>>> or more maintainable?This question should be answerable from the > >>>> Pull > >>>>>>>> Request description (or the linked JIRA) as much as possible.We > >>>>>> recommend > >>>>>>>> to check this before diving into the details of commenting on > >>>>>> individual > >>>>>>>> parts of the change.(5) Is the overall code quality good, > >>> meeting > >>>>>>> standard > >>>>>>>> we want to maintain in Flink?This is the detailed code review of > >>>> the > >>>>>>> actual > >>>>>>>> changes, covering: - Are the changes doing what is described in > >>> the > >>>>>>> design > >>>>>>>> document or PR description?- Does the code follow the right > >>>> software > >>>>>>>> engineering practices? It the code correct, robust, > >>> maintainable, > >>>>>>>> testable?- Are the change performance aware, when changing a > >>>>>> performance > >>>>>>>> sensitive part?- Are the changes sufficiently covered by tests?- > >>>> Are > >>>>>> the > >>>>>>>> tests executing fast?- Does the code format follow Flink’s > >>>> checkstyle > >>>>>>>> pattern?- Does the code avoid to introduce additional compiler > >>>>>>>> warnings?Some code style guidelines can be found in the [Flink > >>> Code > >>>>>> Style > >>>>>>>> Page](https://flink.apache.org/contribute-code.html#code-style > >>>>>>>> <https://flink.apache.org/contribute-code.html#code-style>)Pull > >>>>>> Request > >>>>>>>> Review TemplateAdd the following checklist to the pull request > >>>>> review, > >>>>>>>> checking the boxes as the questions are answered: - [ ] > >>> Consensus > >>>>> that > >>>>>>> the > >>>>>>>> contribution should go into to Flink - [ ] Does not need > >>> specific > >>>>>>>> attention | Needs specific attention for X | Has attention for X > >>>> by Y > >>>>>> - > >>>>>>> [ > >>>>>>>> ] Contribution description - [ ] Architectural approach - [ ] > >>>>> Overall > >>>>>>>> code quality* > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >