On Fri, Sep 28, 2018 at 10:29 AM Thomas Weise <t...@apache.org> wrote:
> +1 for stating the goal of clear provenance and granular rollback. > > > Also of course efficiency and quality of review (we don't to review tiny or out-of-context changes or huge mega-changes), efficiency of authoring (don't want to wait on a review for a tiny bit because GitHub makes it very hard to stack up reviews in sequence / don't want to have major changes blocked because of difficulty of review), ease of new contribution (OK for committers to do more IMO, while new/one-time contributors shouldn't need to know or obey any policy). I think this discussion helps to remind/identify best practices how to get > there. Where appropriate we should augment guidelines for both, contributor > and committer. > Kenn: would you elaborate on the "1 commit = 1 review (and sometimes even = > 1 ticket)" a bit more. Is that a problem of insufficient task / ticket > granularity or something else? > The problem isn't a failure to define tasks at the right granularity, but that they naturally and fundamentally exist with a different granularity (unit of change -> unit of review -> unit of tracking/delivery). I'd guess there's often a 5x jump in size from commit -> review -> ticket. There are many many easy-to-isolate changes that are wasteful to independent review or track. To get some concrete facts, I bet the one things we could probably find research on is the ideal review size. And we could also scrape logs for messages with bullet points (often each bullet is basically what would have been a commit). Generally getting a sense of what these ratios are in practice and idealized would be kind of interesting but maybe overkill. Kenn Charles: my plan is to translate the outcome of this discussion to > guideline updates and your log filter trick will be part of it. Though my > hope is that it won't be needed if we get closer to the goals above. > > Thanks, > Thomas > > > On Fri, Sep 28, 2018 at 9:44 AM Kenneth Knowles <k...@apache.org> wrote: > >> Anton makes a good point. We have been talking about policy for what we >> do, but the real issue is what we want to come out of it: a clear history >> for seeing where code came from and granular rollback. I think in both >> cases the key thing is that each commit is a single clear change. How they >> get there is not the point. >> >> I have worked on multiple projects with a 1 commit = 1 review (and >> sometimes even = 1 ticket). These pretty much never have a good history. >> The best case is that each commit has a message that is a bullet point of >> many separate changes, because it is simply too inefficient to review each >> logical change separately. But since the messages become less useful, it >> encourages a culture of not even bothering to write meaningful messages at >> all. >> >> Note that is trivial for a committer-reviewer to edit the history any way >> they like without the button. "Allow edits by maintainers" is on by >> default. The "Squash and merge" button just adds a button for something we >> can already do. >> >> Charles: super useful! Worth noting that for a PR with a good history it >> will skip meaningful commits (but still give the summary line, which is >> nice). >> >> Kenn >> >> >> >> On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin <ke...@google.com> wrote: >> >>> Is there an actual problem caused by squashing or not squashing the >>> commits that we face in the project? I personally have never needed to >>> revert something complicated that would be problematic either way (and >>> don't have a strong opinion about which way we should do it). From what I >>> see so far in the thread it doesn't look like reverting is a frequent major >>> pain for anyone. Maybe it is exactly because we're mostly following some >>> best practice and it makes it easy. If someone has concrete examples from >>> their experience in the project, please share them, this way it would be >>> easier to justify the choice. >>> >>> The PR and commit cleanliness, size and isolation are probably more >>> important thing to have guidance and maybe enforcement for. There are well >>> known practices and guidelines that I think we should follow, and I think >>> they will make squashing or not squashing mostly irrelevant. For example, >>> if we accept that commits should have description that actually describes >>> what commit does, then "!fixup", "address comments" and similar should not >>> be part of the history and should be squashed before submitting the PR no >>> matter which way we decide to go in general. Also, I think that making >>> commits isolated is also a good practice, and PR author should be able to >>> relatively easily split the PR upon reviewer's request. And if we choose to >>> keep whole PRs small and incremental with descriptive isolated commits, >>> then there won't be too much difference how many commits there are. >>> >>> Regards, >>> Anton >>> >>> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud <apill...@google.com> >>> wrote: >>> >>>> I brought up this discussion a few months ago from the other side: I >>>> don't like my commits being squashed. I try to create logical commits that >>>> each passes tests and could be broken up into multiple PRs. Keeping those >>>> changes intact is useful from a history perspective and squashing may break >>>> other PRs I have in flight. If the intent is clear (one commit with a >>>> descriptive message and a bunch of "fixups"), then feel free to squash, >>>> otherwise ask first. When you do squash, it would be good to leave a >>>> comment as to how the author can avoid having their commits squashed in the >>>> future. >>>> >>>> >>>> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E >>>> >>>> Andrew >>>> >>>> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath < >>>> chamik...@google.com> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw <rober...@google.com> >>>>> wrote: >>>>> >>>>>> I agree that we should create a good pointer for cleaning up PRs, and >>>>>> request (though not require) that authors do it. It's unfortunate though >>>>>> that squashing during a review makes things difficult to follow, so adds >>>>>> one more round trip. >>>>>> >>>>>> We could consider for those PRs that make sense as a single logical >>>>>> commit (most, but not all, of them) simply using the "squash and merge" >>>>>> button even though it technically doesn't create a merge commit. >>>>>> >>>>> >>>>> +1 for allowing "squash and merge" as an option. Most of the reviews >>>>> (at least for me) consist of a single valid commit and several additional >>>>> commits that get piled up during the review process which obviously should >>>>> not be included in the commit history. Going through another round here >>>>> just to ask the author to fixup everything is unnecessarily time >>>>> consuming. >>>>> >>>>> - Cham >>>>> >>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira < >>>>>> danolive...@google.com> wrote: >>>>>> >>>>>>> As a non-committer I think some automated squashing of commits >>>>>>> sounds best since it lightens the load of regular contributors, by not >>>>>>> having to always remember to squash, and lightens the load of >>>>>>> committers so >>>>>>> it doesn't take as long to have your PR approved by one. >>>>>>> >>>>>>> But for now I think the second best route would be making it PR >>>>>>> author's responsibility to squash fixup commits. Having that expectation >>>>>>> described clearly in the Contributor's Guide, along with some simple >>>>>>> step-by-step instructions for how to do so should be enough. I mainly >>>>>>> support this because I've been doing the squashing myself since I saw a >>>>>>> thread about it here a few months ago. It's not nearly as huge a burden >>>>>>> on >>>>>>> me as it probably is for committers who have to merge in many more PRs, >>>>>>> it's very easy to learn how to do, and it's one less barrier to having >>>>>>> my >>>>>>> code merged in. >>>>>>> >>>>>>> Of course I wouldn't expect that committers wait for PR authors to >>>>>>> squash their fixup commits, but I think leaving a message like "For >>>>>>> future >>>>>>> pull requests you should squash any small fixup commits, as described >>>>>>> here: >>>>>>> <link>" should be fine. >>>>>>> >>>>>>> >>>>>>>> I was also thinking about the possibility of wanting to revert >>>>>>>> individual commits from a merge commit. The solution you propose >>>>>>>> works, >>>>>>>> but only if you want to revert everything. >>>>>>> >>>>>>> >>>>>>> Does this happen often? I might not have enough context since I'm >>>>>>> not a committer, but it seems to me that often the person performing a >>>>>>> revert is not the original author of a change and doesn't have the >>>>>>> context >>>>>>> or time to pick out an individual commit to revert. >>>>>>> >>>>>>> On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels <m...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> I tend to agree with you Lukasz. Of course we should try to follow >>>>>>>> the >>>>>>>> guide lines as much as possible but if it requires an extra back >>>>>>>> and >>>>>>>> forth with the PR author for a cosmetic change, it may not be worth >>>>>>>> the >>>>>>>> time. >>>>>>>> >>>>>>>> On 19.09.18 22:17, Lukasz Cwik wrote: >>>>>>>> > I have to say I'm guilty of not following the merge guidelines, >>>>>>>> > sometimes doing merges without rebasing/flatten commits. >>>>>>>> > >>>>>>>> > I find that it is a few extra mins of my time to fix someones PR >>>>>>>> history >>>>>>>> > if they have more then one logical commit they want to be >>>>>>>> separate and >>>>>>>> > it usually takes days for the PR author to do merging with the >>>>>>>> extra >>>>>>>> > burden as a committer to keep track of another PR and its state >>>>>>>> (waiting >>>>>>>> > for clean-up) is taxing. I really liked the idea of the mergebot >>>>>>>> (even >>>>>>>> > though it didn't work out in practice) because it could do all >>>>>>>> the >>>>>>>> > policy work on my behalf. >>>>>>>> > >>>>>>>> > Anything that reduces my overhead as a committer is useful as for >>>>>>>> the >>>>>>>> > 100s of PRs that I have merged, I've only had to rollback a >>>>>>>> couple so >>>>>>>> > I'm for Charle's suggestion which makes the rollback flow >>>>>>>> slightly more >>>>>>>> > complicated for a significantly easier PR merge workflow. >>>>>>>> > >>>>>>>> > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen <c...@google.com >>>>>>>> > <mailto:c...@google.com>> wrote: >>>>>>>> > >>>>>>>> > What I mean is that if you get the first-parent commit using >>>>>>>> "git >>>>>>>> > log --first-parent", it will incorporate any and all fix up >>>>>>>> commits >>>>>>>> > so we don't need to worry about missing any. >>>>>>>> > >>>>>>>> > On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels < >>>>>>>> m...@apache.org >>>>>>>> > <mailto:m...@apache.org>> wrote: >>>>>>>> > >>>>>>>> > Generally, +1 for isolated commits which are easy to >>>>>>>> revert. >>>>>>>> > >>>>>>>> > > I don't think it's actually harder to roll back a set >>>>>>>> of >>>>>>>> > commits that are merged together. >>>>>>>> > I think Thomas was mainly concerned about "fixup" commits >>>>>>>> to >>>>>>>> > land in >>>>>>>> > master (as part of a merge). These indeed make reverting >>>>>>>> commits >>>>>>>> > more >>>>>>>> > difficult because you have to check whether you missed a >>>>>>>> "fixup". >>>>>>>> > >>>>>>>> > > Ideally every commit should compile and pass tests >>>>>>>> though, right? >>>>>>>> > >>>>>>>> > That is definitely what we should strive for when doing a >>>>>>>> merge >>>>>>>> > against >>>>>>>> > master. >>>>>>>> > >>>>>>>> > > Perhaps the bigger issue is that we need better >>>>>>>> documentation >>>>>>>> > and a playbook on how to do this these common tasks in >>>>>>>> git. >>>>>>>> > >>>>>>>> > We do actually have basic documentation about this but >>>>>>>> most >>>>>>>> > people don't >>>>>>>> > read it. For example, the commit message of a Merge commit >>>>>>>> > should be: >>>>>>>> > >>>>>>>> > Merge pull request #xxxx: [BEAM-yyyy] Issue title >>>>>>>> > >>>>>>>> > But most merge commits don't comply with this rule :) See >>>>>>>> > >>>>>>>> https://beam.apache.org/contribute/committer-guide/#merging-it >>>>>>>> > >>>>>>>> > On 19.09.18 21:34, Reuven Lax wrote: >>>>>>>> > > Ideally every commit should compile and pass tests >>>>>>>> though, right? >>>>>>>> > > >>>>>>>> > > On Wed, Sep 19, 2018 at 12:15 PM Ankur Goenka >>>>>>>> > <goe...@google.com <mailto:goe...@google.com> >>>>>>>> > > <mailto:goe...@google.com <mailto:goe...@google.com>>> >>>>>>>> wrote: >>>>>>>> > > >>>>>>>> > > I agree with the cleanliness of the Commit history. >>>>>>>> > > "Fixup!", "Address comments", "Address even more >>>>>>>> > comments" type of >>>>>>>> > > comments does not convey meaningful information >>>>>>>> and are >>>>>>>> > not very >>>>>>>> > > useful. Its a good idea to squash them. >>>>>>>> > > However, I think its ok to keep separate commits >>>>>>>> for >>>>>>>> > different >>>>>>>> > > logical pieces of the code which make reviewing and >>>>>>>> > revisiting code >>>>>>>> > > easier. >>>>>>>> > > Example PR: Support X in the pipeline >>>>>>>> > > Commit 1: Restructuring a bunch of code without any >>>>>>>> > logical change. >>>>>>>> > > Commit 2: Changing validation logic for pipeline. >>>>>>>> > > Commit 3: Supporting new field "X" for pipeline. >>>>>>>> > > >>>>>>>> > > On Wed, Sep 19, 2018 at 11:27 AM Charles Chen >>>>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>>>> > > <mailto:c...@google.com <mailto:c...@google.com>>> >>>>>>>> wrote: >>>>>>>> > > >>>>>>>> > > To be concrete, it is very easy to revert a >>>>>>>> commit in >>>>>>>> > any case: >>>>>>>> > > >>>>>>>> > > 1. First, use "git log --first-parent" to >>>>>>>> find the >>>>>>>> > first-parent >>>>>>>> > > commit corresponding to a PR merge (this >>>>>>>> is a >>>>>>>> > one-to-one >>>>>>>> > > correspondence). >>>>>>>> > > 2. Use "git revert -m 1 <commitid>" to revert >>>>>>>> the >>>>>>>> > commit; this >>>>>>>> > > selects the first parent as the base for a >>>>>>>> merge >>>>>>>> > commit (in >>>>>>>> > > the case where a single commit needs to be >>>>>>>> > reverted, just >>>>>>>> > > use "git revert <commitid>" without the >>>>>>>> "-m 1" flag). >>>>>>>> > > >>>>>>>> > > In any case, as a general good engineering >>>>>>>> practice, >>>>>>>> > I do agree >>>>>>>> > > that it is highly desirable to have small >>>>>>>> independent PRs >>>>>>>> > > instead of large jumbo PRs whenever possible. >>>>>>>> > > >>>>>>>> > > On Wed, Sep 19, 2018 at 11:20 AM Charles Chen >>>>>>>> > <c...@google.com <mailto:c...@google.com> >>>>>>>> > > <mailto:c...@google.com <mailto:c...@google.com>>> >>>>>>>> wrote: >>>>>>>> > > >>>>>>>> > > I don't think it's actually harder to roll >>>>>>>> back a >>>>>>>> > set of >>>>>>>> > > commits that are merged together. Git has >>>>>>>> the >>>>>>>> > notion of >>>>>>>> > > first-parent commits (you can see, for >>>>>>>> example, >>>>>>>> > "git log >>>>>>>> > > --first-parent", which filters out the >>>>>>>> intermediate >>>>>>>> > > commits). In this sense, PRs still get >>>>>>>> merged as >>>>>>>> > one unit >>>>>>>> > > and this is preserved even if intermediate >>>>>>>> > commits are >>>>>>>> > > kept. Perhaps the bigger issue is that we >>>>>>>> need >>>>>>>> > better >>>>>>>> > > documentation and a playbook on how to do >>>>>>>> this >>>>>>>> > these common >>>>>>>> > > tasks in git. >>>>>>>> > > >>>>>>>> > > On Wed, Sep 19, 2018 at 9:27 AM Thomas >>>>>>>> Weise >>>>>>>> > <t...@apache.org <mailto:t...@apache.org> >>>>>>>> > > <mailto:t...@apache.org <mailto: >>>>>>>> t...@apache.org>>> >>>>>>>> > wrote: >>>>>>>> > > >>>>>>>> > > Wanted to bring this up as reminder as >>>>>>>> well as >>>>>>>> > > opportunity to discuss potential >>>>>>>> changes to our >>>>>>>> > > committer guide. It has been a while >>>>>>>> since >>>>>>>> > last related >>>>>>>> > > discussion and we welcomed several new >>>>>>>> > committers since >>>>>>>> > > then. >>>>>>>> > > >>>>>>>> > > Finishing up pull requests pre-merge: >>>>>>>> > > >>>>>>>> > > >>>>>>>> > >>>>>>>> https://beam.apache.org/contribute/committer-guide/#finishing-touches >>>>>>>> > > >>>>>>>> > > PRs are worked on over time and may >>>>>>>> > accumulate many >>>>>>>> > > commits. Sometimes because scope >>>>>>>> expands, >>>>>>>> > sometimes just >>>>>>>> > > to separate independent changes but >>>>>>>> most of >>>>>>>> > the time the >>>>>>>> > > commits are just fixups that are added >>>>>>>> as >>>>>>>> > review progresses. >>>>>>>> > > It is important that the latter get >>>>>>>> squashed >>>>>>>> > prior to PR >>>>>>>> > > merge, as otherwise we lost the >>>>>>>> ability to >>>>>>>> > roll back >>>>>>>> > > changes by reverting a single commit >>>>>>>> and also >>>>>>>> > generally >>>>>>>> > > cause a lot of noise in the commit >>>>>>>> history >>>>>>>> > that does not >>>>>>>> > > help other contributors. To be clear, >>>>>>>> I refer >>>>>>>> > to the >>>>>>>> > > "Fixup!", "Address comments", "Address >>>>>>>> even more >>>>>>>> > > comments" type of entries :) >>>>>>>> > > >>>>>>>> > > I would also propose that every commit >>>>>>>> gets >>>>>>>> > tagged with >>>>>>>> > > a JIRA (except those fixups that will >>>>>>>> be >>>>>>>> > squashed). >>>>>>>> > > Having the JIRA and possibly other >>>>>>>> tags makes >>>>>>>> > it easier >>>>>>>> > > for others not involved in the PR to >>>>>>>> identify >>>>>>>> > changes >>>>>>>> > > after they were merged, for example >>>>>>>> when >>>>>>>> > looking at the >>>>>>>> > > revision history or annotated source. >>>>>>>> > > >>>>>>>> > > As for other scenarios of jumbo PRs >>>>>>>> with many >>>>>>>> > commits, >>>>>>>> > > there are probably situations where >>>>>>>> work >>>>>>>> > needs to be >>>>>>>> > > broken down into smaller units, making >>>>>>>> life >>>>>>>> > better for >>>>>>>> > > both, contributor and reviewer(s). >>>>>>>> Ideally, >>>>>>>> > every PR >>>>>>>> > > would have only one commit, but that >>>>>>>> may be a >>>>>>>> > bit much >>>>>>>> > > to mandate? Is the general expectation >>>>>>>> > something we need >>>>>>>> > > to document more clearly? >>>>>>>> > > >>>>>>>> > > Thanks, >>>>>>>> > > Thomas >>>>>>>> > > >>>>>>>> > >>>>>>>> >>>>>>>