I'm +1 for ctrezzo's proposal, happy to do the revert from branch-2.7 if this is acceptable to Vinod.
There's some additional discussion on the HDFS-8791 JIRA for those who are only following this email thread. Best, Andrew On Tue, Apr 5, 2016 at 2:03 PM, Chris Trezzo <ctre...@gmail.com> wrote: > In light of the additional conversation on HDFS-8791, I would like to > re-propose the following: > > 1. Revert the new datanode layout (HDFS-8791) from the 2.7 branch. The > layout change currently does not support downgrades which breaks our > upgrade/downgrade policies for dot releases. > > 2. Cut a 2.8 release off of the 2.7.3 release with the addition of > HDFS-8791. This would give customers a stable release that they could > deploy with the new layout. As discussed on the jira, this is still in line > with user expectation for minor releases as we have done layout changes in > a number of 2.x minor releases already. The current 2.8 would become 2.9 > and continue its current release schedule. > > What does everyone think? If unsupported downgrades between minor releases > is still not agreeable, then as stated by Vinod, we would need to either > add support for downgrades with dn layout changes or revert the layout > change from branch-2. If we are OK with the layout change in a minor > release, but think that the issue does not affect enough customers to > warrant a separate release, we could simply leave it in branch-2 and let it > be released with the current 2.8. > > > On Mon, Apr 4, 2016 at 1:48 PM, Vinod Kumar Vavilapalli < > vino...@apache.org> > wrote: > > > I commented on the JIRA way back (see > > > https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=15036666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036666 > ), > > saying what I said below. Unfortunately, I haven’t followed the patch > along > > after my initial comment. > > > > This isn’t about any specific release - starting 2.6 we declared support > > for rolling upgrades and downgrades. Any patch that breaks this should > not > > be in branch-2. > > > > Two options from where I stand > > (1) For folks who worked on the patch: Is there a way to make (a) the > > upgrade-downgrade seamless for people who don’t care about this (b) and > > have explicit documentation for people who care to switch this behavior > on > > and are willing to risk not having downgrades. If this means a new > > configuration property, so be it. It’s a necessary evil. > > (2) Just let specific users backport this into specific 2.x branches > they > > need and leave it only on trunk. > > > > Unless this behavior stops breaking rolling upgrades/downgrades, I think > > we should just revert it from branch-2 and definitely 2.7.3 as it stands > > today. > > > > +Vinod > > > > > > > On Apr 1, 2016, at 2:54 PM, Chris Trezzo <ctre...@gmail.com> wrote: > > > > > > A few thoughts: > > > > > > 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a > > > prerequisite for HDFS-8791. Without that patch, upgrades can be very > slow > > > for data nodes depending on your setup. > > > > > > 2. We have already deployed this patch internally so, with my Twitter > hat > > > on, I would be perfectly happy as long as it makes it into trunk and > 2.8. > > > That being said, I would be hesitant to deploy the current 2.7.x or > 2.6.x > > > releases on a large production cluster that has a diverse set of block > > ids > > > without this patch, especially if your data nodes have a large number > of > > > disks or you are using federation. To be clear though: this highly > > depends > > > on your setup and at a minimum you should verify that this regression > > will > > > not affect you. The current block-id based layout in 2.6.x and 2.7.2 > has > > a > > > performance regression that gets worse over time. When you see it > > happening > > > on a live cluster, it is one of the harder issues to identify a root > > cause > > > and debug. I do understand that this is currently only affecting a > > smaller > > > number of users, but I also think this number has potential to increase > > as > > > time goes on. Maybe we can issue a warning in the release notes for > > future > > > 2.7.x and 2.6.x releases? > > > > > > 3. One option (this was suggested on HDFS-8791 and I think Sean alluded > > to > > > this proposal on this thread) would be to cut a 2.8 release off of the > > > 2.7.3 release with the new layout. What people currently think of as > 2.8 > > > would then become 2.9. This would give customers a stable release that > > they > > > could deploy with the new layout and would not break upgrade and > > downgrade > > > expectations. > > > > > > On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurt...@apache.org> > > wrote: > > > > > >> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we > > would > > >> patch the release to revert HDFS-8791 before pushing it out to > > production. > > >> For what it's worth. > > >> > > >> > > >> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang < > andrew.w...@cloudera.com> > > >> wrote: > > >> > > >>> One other thing I wanted to bring up regarding HDFS-8791, we haven't > > >>> backported the parallel DN upgrade improvement (HDFS-8578) to > > branch-2.6. > > >>> HDFS-8578 is a very important related fix since otherwise upgrade > will > > be > > >>> very slow. > > >>> > > >>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang < > > andrew.w...@cloudera.com> > > >>> wrote: > > >>> > > >>>> As I expressed on HDFS-8791, I do not want to include this JIRA in a > > >>>> maintenance release. I've only seen it crop up on a handful of our > > >>>> customer's clusters, and large users like Twitter and Yahoo that > seem > > >> to > > >>> be > > >>>> more affected are also the most able to patch this change in > > >> themselves. > > >>>> > > >>>> Layout upgrades are quite disruptive, and I don't think it's worth > > >>>> breaking upgrade and downgrade expectations when it doesn't affect > the > > >>> (in > > >>>> my experience) vast majority of users. > > >>>> > > >>>> Vinod seemed to have a similar opinion in his comment on HDFS-8791, > > but > > >>>> will let him elaborate. > > >>>> > > >>>> Best, > > >>>> Andrew > > >>>> > > >>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <bus...@cloudera.com> > > >>> wrote: > > >>>> > > >>>>> As of 2 days ago, there were already 135 jiras associated with > 2.7.3, > > >>>>> if *any* of them end up introducing a regression the inclusion of > > >>>>> HDFS-8791 means that folks will have cluster downtime in order to > > back > > >>>>> things out. If that happens to any substantial number of downstream > > >>>>> folks, or any particularly vocal downstream folks, then it is very > > >>>>> likely we'll lose the remaining trust of operators for rolling out > > >>>>> maintenance releases. That's a pretty steep cost. > > >>>>> > > >>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks having > to > > >>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an > > >>>>> unreasonable burden. > > >>>>> > > >>>>> I agree that this fix is important, I just think we should either > cut > > >>>>> a version of 2.8 that includes it or find a way to do it that gives > > an > > >>>>> operational path for rolling downgrade. > > >>>>> > > >>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <j...@hortonworks.com> > > >>> wrote: > > >>>>>> Thanks for bringing up this topic, Sean. > > >>>>>> When I released our latest Hadoop release 2.6.4, the patch of > > >>> HDFS-8791 > > >>>>> haven't been committed in so that's why we didn't discuss this > > >> earlier. > > >>>>>> I remember in JIRA discussion, we treated this layout change as a > > >>>>> Blocker bug that fixing a significant performance regression before > > >> but > > >>> not > > >>>>> a normal performance improvement. And I believe HDFS community > > already > > >>> did > > >>>>> their best with careful and patient to deliver the fix and other > > >> related > > >>>>> patches (like upgrade fix in HDFS-8578). Take an example of > > HDFS-8578, > > >>> you > > >>>>> can see 30+ rounds patch review back and forth by senior > committers, > > >>> not to > > >>>>> mention the outstanding performance test data in HDFS-8791. > > >>>>>> I would trust our HDFS committers' judgement to land HDFS-8791 on > > >>>>> 2.7.3. However, that needs Vinod's final confirmation who serves as > > RM > > >>> for > > >>>>> branch-2.7. In addition, I didn't see any blocker issue to bring it > > >> into > > >>>>> 2.6.5 now. > > >>>>>> Just my 2 cents. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Junping > > >>>>>> > > >>>>>> ________________________________________ > > >>>>>> From: Sean Busbey <bus...@cloudera.com> > > >>>>>> Sent: Thursday, March 31, 2016 2:57 PM > > >>>>>> To: hdfs-dev@hadoop.apache.org > > >>>>>> Cc: Hadoop Common; yarn-...@hadoop.apache.org; > > >>>>> mapreduce-...@hadoop.apache.org > > >>>>>> Subject: Re: 2.7.3 release plan > > >>>>>> > > >>>>>> A layout change in a maintenance release sounds very risky. I saw > > >> some > > >>>>>> discussion on the JIRA about those risks, but the consensus seemed > > >> to > > >>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers." I > > >> thought > > >>>>>> we did RMs per release rather than per branch? No one claiming to > > >> be a > > >>>>>> release manager ever spoke up AFAICT. > > >>>>>> > > >>>>>> Should this change be included? Should it go into a special 2.8 > > >>>>>> release as mentioned in the ticket? > > >>>>>> > > >>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA > > >>>>>> <ajisa...@oss.nttdata.co.jp> wrote: > > >>>>>>> Thank you Vinod! > > >>>>>>> > > >>>>>>> FYI: 2.7.3 will be a bit special release. > > >>>>>>> > > >>>>>>> HDFS-8791 bumped up the datanode layout version, > > >>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2] > > >>>>>>> is impossible. We can rollback instead. > > >>>>>>> > > >>>>>>> https://issues.apache.org/jira/browse/HDFS-8791 > > >>>>>>> > > >>>>> > > >>> > > >> > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Akira > > >>>>>>> > > >>>>>>> > > >>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote: > > >>>>>>>> > > >>>>>>>> Hi all, > > >>>>>>>> > > >>>>>>>> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to go > out > > >>>>> (which > > >>>>>>>> did go out mid February). Got a little busy since. > > >>>>>>>> > > >>>>>>>> Following up the 2.7.2 maintenance release, we should work > > >> towards a > > >>>>>>>> 2.7.3. The focus obviously is to have blocker issues [1], > > >> bug-fixes > > >>>>> and *no* > > >>>>>>>> features / improvements. > > >>>>>>>> > > >>>>>>>> I hope to cut an RC in a week - giving enough time for > outstanding > > >>>>> blocker > > >>>>>>>> / critical issues. Will start moving out any tickets that are > not > > >>>>> blockers > > >>>>>>>> and/or won’t fit the timeline - there are 3 blockers and 15 > > >> critical > > >>>>> tickets > > >>>>>>>> outstanding as of now. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> +Vinod > > >>>>>>>> > > >>>>>>>> [1] 2.7.3 release blockers: > > >>>>>>>> https://issues.apache.org/jira/issues/?filter=12335343 > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> busbey > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> busbey > > >>>>> > > >>>> > > >>>> > > >>> > > >> > > >> > > >> > > >> -- > > >> Best regards, > > >> > > >> - Andy > > >> > > >> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > >> (via Tom White) > > >> > > > > >