Sounds like we have a lot of support for also allowing merge workflows. Let me draft a proper proposal and go through the [DISCUSS] and [VOTE] process. One thing I think we should amend from the previous [VOTE] is using "git merge --no-ff" rather than "rebase --onto" for branch -> trunk integration, since it makes reverting the branch easier. Also using "git merge" rather than a squashed commit for the branch-2 backport as Vinay said.
In the meantime, I think it's okay for ongoing feature branch development like HDFS-7285 to start using merge rather than rebase. Haven't seen any objections to merge yet. On Tue, Aug 18, 2015 at 1:39 AM, Vinayakumar B <vinayakum...@apache.org> wrote: > +1, I agree with the support for git-merge based workflows for large branch > merge. > > I have experienced the pain of re-basing the entire branch HDFS-7285, just > for verification though, and I found even a line change in trunk in core > files ( ex: FSNameSystem.java, BlockManager.java ) makes it hard to rebase > many commits in the branch. > One main problem, as I have experienced, with git-rebase is, > If we need to retain same commits, All conflicts should be resolved by > the same person who is doing the rebase, as 'git-rebase' should be executed > in same machine and there is a fair chance of miss-handling conflicts and > causing problem. The person doing rebase may not be very familiar with the > conflicted code. > In these kind of situations, I think its very hard to find out what was > the original code and what is conflicted code, once the rebase is done. > > IMO, its fair to go with periodic merge from trunk->branch, even though > there are little conflicts, these may not be much problematic, compare to > rebase-conflicts. > > Regarding merging to branch-2, though it needs little more conflict > resolutions compare to trunk, but may not be too much, as trunk and > branch-2 are going parallel, at-least in terms of features and fixes ( ~ > > 90% I would say). > > Regards, > Vinay > > On Tue, Aug 18, 2015 at 6:12 AM, Sangjin Lee <sj...@apache.org> wrote: > > > I also think allowing merges as a way to uprev with trunk would be a good > > idea. AFAIK, git rebase works well when your branch is short-lived and > > contains a fairly small number of commits, but doesn't work so well if > your > > branch is large. Also, the cost of rebase will only go up as time goes. > On > > the other hand, git merge has a pretty decent chance to succeed, > especially > > more so if you merge the trunk often. My 2 cents. > > > > Sangjin > > > > On Mon, Aug 17, 2015 at 1:18 PM, Jing Zhao <jing.apa...@gmail.com> > wrote: > > > > > I think we should allow merge-based workflows. I worked and am working > in > > > several big feature branches, including HDFS-2802 (>100 subtasks) and > > > HDFS-7285 (currently already > 200 subtasks), and tried both the > > > merge-based and rebase-based workflows. When the feature change becomes > > > big, the rebase will become a big pain, considering a small change in > > trunk > > > can cause conflicts for rebasing large number of commits in the feature > > > branch. Using "git merge" to merge trunk changes into the feature > branch > > is > > > much easier in this case. > > > > > > Thanks, > > > -Jing > > > > > > On Mon, Aug 17, 2015 at 12:17 PM, Andrew Wang < > andrew.w...@cloudera.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I've thought about this topic more over the last week, and felt I > > should > > > > play devil's advocate for a merge workflow. A few comments: > > > > > > > > - The issue of merges "polluting history" is mainly an issue when > > > using > > > > a github PR workflow, which results in one merge per PR. Clearly > > this > > > is > > > > not okay, but a separate issue from feature branches. We only > have a > > > > handful of merge commits per feature branch. > > > > - The issue of changes hiding in merge commits can happen when > > > resolving > > > > rebase conflicts too, except it's harder to track. Right now > neither > > > go > > > > through code review, which is sketchy. We probably should review > > these > > > > too, > > > > and it's easier to review a single merge commit vs. an entire > > rebased > > > > branch. Merge is also a more natural way of integrating changes > from > > > > trunk, > > > > since you just resolve all conflicts at once at the end. > > > > - Merge gives us a linear history on the branch but worse history > on > > > > trunk/branch-2. Rebase has worse history on the branch but a > linear > > > > history > > > > on trunk/branch-2. This means for quick/small feature branches > that > > > > don't > > > > have a lot of conflicts, rebase is preferred. For large features > > with > > > > lots > > > > of conflicts, merge is preferred. This is basically what we're > > running > > > > into > > > > on HDFS-7285. > > > > - Rebase also comes with increased coordination costs, since > public > > > > history is being rewritten. This is again okay for smaller efforts > > > > (where > > > > there are fewer contributors), but more painful with bigger ones. > > > There > > > > have been a number of HDFS-7285 branches created basically as a > > result > > > > of > > > > rebase, with corresponding JIRA discussions about where to commit > > > > things. > > > > - The issue of a single squashed commit for the branch-2 backport > is > > > > arguably an issue with how we structure our branches. If release > > > > branches > > > > forked off of trunk rather than branch-2, we wouldn't have this > > > > problem. We > > > > could require branch-2 integration to also happen via git merge. > Or > > we > > > > kick > > > > trunk out to a feature branch based off of branch-2. Or we shrug > and > > > > keep > > > > the status quo. > > > > > > > > I'd definitely appreciate commentary from others who've worked on > > feature > > > > branches in git, even in communities outside of Hadoop. > > > > > > > > If there is support for allowing merge-based workflows in addition to > > > > rebase, we'd need to kick off a [VOTE] thread since the last [VOTE] > > only > > > > allows rebase. > > > > > > > > Best, > > > > Andrew > > > > > > > > On Mon, Aug 17, 2015 at 11:33 AM, Andrew Wang < > > andrew.w...@cloudera.com> > > > > wrote: > > > > > > > > > @Sangjin, > > > > > > > > > > I believe this is covered by the [VOTE] I linked to above, key > > excerpt > > > > > being: > > > > > > > > > > 3. Force-push on feature-branches is allowed. Before pulling in > a > > > > > feature, the feature-branch should be rebased on latest trunk > and > > > the > > > > > changes applied to trunk through "git rebase --onto" or "git > > > > cherry-pick > > > > > <commit-range>". > > > > > > > > > > This specifies that the last uprev final integration of the branch > > into > > > > trunk happen with rebase. It doesn't say anything about the periodic > > > > uprev's, but it'd be very strange to merge periodically and then > rebase > > > > once at the end. So I take it to mean doing periodic uprevs with > rebase > > > too. > > > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 11:23 AM, Sangjin Lee <sj...@apache.org> > > > wrote: > > > > > > > > > >> Just to be clear, are we discussing the process of uprev'ing the > > > feature > > > > >> development branch with the latest from the trunk from time to > time, > > > or > > > > >> making the final merge of the feature branch onto the trunk? > > > > >> > > > > >> On Mon, Aug 17, 2015 at 10:21 AM, Steve Loughran < > > > > ste...@hortonworks.com> > > > > >> wrote: > > > > >> > > > > >> > I haven't done a bit piece of work in the ASF code repo since > the > > > > >> > migration to git; though I have done it in the svn era. > > > > >> > > > > > >> > > > > > >> > Currently with private git repos > > > > >> > -anyone gets SCM control of their source > > > > >> > -you can commit for your own reasons (about to make a change, > > want a > > > > >> > private jenkins run, ...) and gain from having many small > > checkins. > > > > More > > > > >> > succinctly: if you aren't checking in your work 2+ times a day > > —why > > > > not? > > > > >> > -rebasing a painful necessity on personal, private branches to > > keep > > > > the > > > > >> > final patch to hadoop git a single diff > > > > >> > > > > > >> > With the private git process that's the defacto standard, we > lose > > > > >> history > > > > >> > anyway. I know what I've done and somewhere there's a tag in my > > own > > > > >> github > > > > >> > repo of my work to create a JIRA. But we don't always need that > > > entire > > > > >> > history of "trying to debug kerberos", "typo in exception", and > > > other > > > > >> stuff > > > > >> > that accrues during the work. > > > > >> > > > > > >> > I think therefore that I'm in favour of big squash commits. What > > we > > > > >> could > > > > >> > do is extend that with a policy of > > > > >> > > > > > >> > > > > > >> > 1. tag the final commit used to make the patch, something > like > > > > >> > tag_HADOOP-8192. The tag ensures that the history isn't gc'd > > > > >> > 2. Delete the branch (keeps the #of branches down) > > > > >> > 3. In the JIRA, include the name of the tag and the git > commit > > > > number > > > > >> > in the comments. Someone curious can rebuild that history > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >