Thanks for your thoughts Anu. Regarding your question
> And then comes the question, once 3.0 becomes official, where do we > check-in a change, if that would break something? so this will lead us > back to trunk being the unstable – 3.0 being the new “branch-2”. Andrew mentioned in the original email > Regarding "trunk-incompat", since we're still in the alpha stage for > 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal > was still under a bit of discussion; Chris Douglas though we should cut a > branch-3 for the first 3.0.0 beta, which aligns with my original thinking. > This point doesn't necessarily need to be resolved now though, since again > we're still doing alphas. and I agree with that sentiment. I think even if we have a "trunk-incompat" branch to hold future incompatible changes, the situation will change little from today. Instead of dealing with "trunk" (where incompatible changes may appear) and "branch-3", we would be dealing with "trunk-incompat" and "trunk". Names are largely mnemonics then. On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengin...@hortonworks.com> wrote: > I actively work on two branches (Diskbalancer and ozone) and I agree with > most of what Sangjin said. > There is an overhead in working with branches, there are both technical > costs and administrative issues > which discourages developers from using branches. > > I think the biggest issue with branch based development is that fact that > other developers do not use a branch. > If a small feature appears as a series of commits to “”datanode.java””, > the branch based developer ends up rebasing > and paying this price of rebasing many times. If everyone followed a model > of branch + Pull request, other branches > would not have to deal with continues rebasing to trunk commits. If we are > moving to a branch based > development, we should probably move to that model for most development to > avoid this tax on people who > actually end up working in the branches. > > I do have a question in my mind though: What is being proposed is that we > move active development to branches > if the feature is small or incomplete, however keep the trunk open for > check-ins. One of the biggest reason why we > check-in into trunk and not to branch-2 is because it is a change that > will break backward compatibility. So do we > have an expectation of backward compatibility thru the 3.0-alpha series (I > personally vote No, since 3.0 is experimental > at this stage), but if we decide to support some sort of backward-compact > then willy-nilly committing to trunk > and still maintaining the expectation we can release Alphas from 3.0 does > not look possible. > > And then comes the question, once 3.0 becomes official, where do we > check-in a change, if that would break something? > so this will lead us back to trunk being the unstable – 3.0 being the new > “branch-2”. > > One more point: If we are moving to use a branch always – then we are > looking at a model similar to using a git + pull > request model. If that is so would it make sense to modify the rules to > make these branches easier to merge? > Say for example, if all commits in a branch has followed review and > checking policy – just like trunk and commits > have been made only after a sign off from a committer, would it be > possible to merge with a 3-day voting period > instead of 7, or treat it just like today’s commit to trunk – but with 2 > people signing-off? > > What I am suggesting is reducing the administrative overheads of using a > branch to encourage use of branching. > Right now it feels like Apache’s process encourages committing directly to > trunk than a branch > > Thanks > Anu > > > On 6/10/16, 10:50 AM, "sjl...@gmail.com on behalf of Sangjin Lee" < > sjl...@gmail.com on behalf of sj...@apache.org> wrote: > > >Having worked on a major feature in a feature branch, I have some thoughts > >and observations on feature branch development. > > > >IMO feature branch development v. direct commits to trunk in piecemeal is > >really a choice of *granularity*. Do we want a series of fine-grained > state > >changes on trunk or fewer coarse-grained chunks of commits on trunk? > > > >This makes me favor a branch-based development model for any > "decent-sized" > >features (we'll need to define "decent-sized" of course). Once you have > >coarse-grained changes, it's easier to reason about what made what release > >and in what state. As importantly, it makes it easier to back out a > >complete feature fairly easily if that becomes necessary. My totally > >unscientific suggestion may be if a feature takes more than dozen commits > >and longer than a month, we should probably have a bias towards a feature > >branch. > > > >Branch-based development also makes you go faster if your feature is > >larger. I wouldn't do it the other way for timeline service v.2 for > example. > > > >That said, feature branches don't come for free. Now the onus is on the > >feature developer to constantly rebase with the trunk to keep it > reasonably > >integrated with the trunk. More logistics is involved for the feature > >developer. Another big question is, when a feature branch gets big and > it's > >time to merge, would it get as scrutinized as a series of individual > >commits? Since the size of merge can be big, you kind of have to rely on > >those feature committers and those who help them. > > > >In terms of integrating/stabilizing, I don't think branch development > >necessarily makes it harder. It is again granularity. In case of direct > >commits on trunk, you do a lot more fine-grained integrations. In case of > >branch development, you do far fewer coarse-grained integrations via > >rebasing. If more people are doing branch-based development, it makes > >rebasing easier to manage too. > > > >Going back to the related topic of where to release (trunk v. branch-X), I > >think that is more of a proxy of the real question of "how do we maintain > >quality and stability of the trunk?". Even if we release from the trunk, > if > >our bar for merging to trunk is low, the quality will not improve > >automatically. So I think we ought to tackle the quality question first. > > > >My 2 cents. > > > > > >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <z...@apache.org> wrote: > > > >> Thanks for the notes Andrew, Junping, Karthik. > >> > >> Here are some of my understandings: > >> > >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using > >> Hadoop today, without legacy workloads, trunk is what he/she should use. > >> - Therefore, each commit to trunk should be transactional -- atomic, > >> consistent, isolated (from other uncommitted patches); I'm not so sure > >> about durability, Hadoop might be gone in 50 years :). As a committer, I > >> should be able to look at a patch and determine whether it's a > >> self-contained improvement of trunk, without looking at other > uncommitted > >> patches. > >> - Some comments inline: > >> > >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <j...@hortonworks.com> wrote: > >> > >> > Comparing with advantages, I believe the disadvantages of shipping any > >> > releases directly from trunk are more obvious and significant: > >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) > have > >> > to wait to commit to trunk or put into a separated branch that could > >> delay > >> > feature development progress as additional vote process get involved > even > >> > the feature is simple and harmless. > >> > > >> Thanks Junping, those are valid concerns. I think we should clearly > >> separate incompatible with uncompleted / half-done work in this > >> discussion. Whether people should commit incompatible changes to trunk > is a > >> much more tricky question (related to trunk-incompat etc.). But per my > >> comment above, IMHO, *not committing uncompleted work to trunk* should > be a > >> much easier principle to agree upon. > >> > >> > >> > - For small feature with only 1 or 2 commits, that need three +1 from > >> PMCs > >> > will increase the bar largely for contributors who just start to > >> contribute > >> > on Hadoop features but no such sufficient support. > >> > > >> Development overhead is another valid concern. I think our rule-of-thumb > >> should be that, small-medium new features should be proposed as a single > >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes > >> beyond a single JIRA/patch, use a feature branch. > >> > >> > >> > > >> > Given these concerns, I am open to other options, like: proposed by > Vinod > >> > or Chris, but rather than to release anything directly from trunk. > >> > > >> > - This point doesn't necessarily need to be resolved now though, since > >> > again we're still doing alphas. > >> > No. I think we have to settle down this first. Without a common agreed > >> and > >> > transparent release process and branches in community, any release > >> (alpha, > >> > beta) bits is only called a private release but not a official apache > >> > hadoop release (even alpha). > >> > > >> > > >> > Thanks, > >> > > >> > Junping > >> > ________________________________________ > >> > From: Karthik Kambatla <ka...@cloudera.com> > >> > Sent: Friday, June 10, 2016 7:49 AM > >> > To: Andrew Wang > >> > Cc: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; > >> > mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org > >> > Subject: Re: [DISCUSS] Increased use of feature branches > >> > > >> > Thanks for restarting this thread Andrew. I really hope we can get > this > >> > across to a VOTE so it is clear. > >> > > >> > I see a few advantages shipping from trunk: > >> > > >> > - The lack of need for one additional backport each time. > >> > - Feature rot in trunk > >> > > >> > Instead of creating branch-3, I recommend creating branch-3.x so we > can > >> > continue doing 3.x releases off branch-3 even after we move trunk to > 4.x > >> (I > >> > said it :)) > >> > > >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang < > andrew.w...@cloudera.com> > >> > wrote: > >> > > >> > > Hi all, > >> > > > >> > > On a separate thread, a question was raised about 3.x branching and > use > >> > of > >> > > feature branches going forward. > >> > > > >> > > We discussed this previously on the "Looking to a Hadoop 3 release" > >> > thread > >> > > that has spanned the years, with Vinod making this proposal > (building > >> on > >> > > ideas from others who also commented in the email thread): > >> > > > >> > > > >> > > > >> > > >> > http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser > >> > > > >> > > Pasting here for ease: > >> > > > >> > > On an unrelated note, offline I was pitching to a bunch of > >> > > contributors another idea to deal > >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk > directly*. > >> > > > >> > > What this gains us is that > >> > > - Trunk is always nearly stable or nearly ready for releases > >> > > - We no longer have some code lying around in some branch (today’s > >> > > trunk) that is not releasable > >> > > because it gets mixed with other undesirable and incompatible > changes. > >> > > - This needs to be coupled with more discipline on individual > >> > > features - medium to to large > >> > > features are always worked upon in branches and get merged into > trunk > >> > > (and a nearing release!) > >> > > when they are ready > >> > > - All incompatible changes go into some sort of a trunk-incompat > >> > > branch and stay there till > >> > > we accumulate enough of those to warrant another major release. > >> > > > >> > > Regarding "trunk-incompat", since we're still in the alpha stage for > >> > 3.0.0, > >> > > there's no need for this branch yet. This aspect of Vinod's proposal > >> was > >> > > still under a bit of discussion; Chris Douglas though we should cut > a > >> > > branch-3 for the first 3.0.0 beta, which aligns with my original > >> > thinking. > >> > > This point doesn't necessarily need to be resolved now though, since > >> > again > >> > > we're still doing alphas. > >> > > > >> > > What we should get consensus on is the goal of keeping trunk stable, > >> and > >> > > achieving that by doing more development on feature branches and > being > >> > > judicious about merges. My sense from the Hadoop 3 email thread (and > >> the > >> > > more recent one on the async API) is that people are generally in > favor > >> > of > >> > > this. > >> > > > >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly > >> > > appreciate everyone's timely response in this matter. > >> > > > >> > > Thanks, > >> > > Andrew > >> > > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> > > >> > > >> > >