Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Alan Gates Fri, 22 May 2015 10:58:24 -0700

I don't think anyone is advocating for option 2, as that would bedisastrous. Option 3 is closest to what I'm proposing, though againdropping support for Hadoop 1 is only a part of it.


Alan.

Alexander Pivovarov <mailto:apivova...@gmail.com>
May 22, 2015 at 10:03
Looks like we discussing 3 options:

1. Support hadoop 1, 2 and 3 in master branch.
2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 inbranch-3
3. Support hadoop 2 and 3 in master
I DO not think option 2 is good solution because it is much moredifficuilt
to manage 3 active prod branches rather than one master branch.

I think we should go with options 1 or 3.

+1 on Xuefu and Edward opinion

Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 22, 2015 at 9:08
I think branch-2 doesn’t need to be framed as particularly adventurous
(other than due to general increase of the amount of work done in Hive by
community).
All the new features that normally go on trunk/master will go to branch-2.
branch-2 is just trunk as it is now, in fact there will be no branch-2,
just master :) The difference is the dropped functionality, not added one.
So you shouldn’t lose stability if you retain the same process as now by
just staying on versions off master.

Perhaps, as is usually the case in Apache projects, developing features on
older branches would be discouraged. Right now, all features usually go on
trunk/master, and are then back ported as needed and practical; so you
wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
and not back port to master.


Chris Drome <mailto:cdr...@yahoo-inc.com.INVALID>
May 22, 2015 at 0:49
I understand the motivation and benefits of creating a branch-2 wheremore disruptive work can go on without affecting branch-1. While notnecessarily against this approach, from Yahoo's standpoint, I do havesome questions (concerns).Upgrading to a new version of Hive requires a significant commitmentof time and resources to stabilize and certify a build for deploymentto our clusters. Given the size of our clusters and scale of datasets,we have to be particularly careful about adopting new functionality.However, at the same time we are interested in new testing and makingavailable new features and functionality. That said, we would have torely on branch-1 for the immediate future.One concern is that branch-1 would be left to stagnate, at which pointthere would be no option but for users to move to branch-2 as branch-1would be effectively end-of-lifed. I'm not sure how long this wouldtake, but it would eventually happen as a direct result of the veryreason for creating branch-2.A related concern is how disruptive the code changes will be inbranch-2. I imagine that changes in early in branch-2 will be easy tobackport to branch-1, while this effort will become more difficult, ifnot impractical, as time goes. If the code bases diverge too much thenthis could lead to more pressure for users of branch-1 to add featuresjust to branch-1, which has been mentioned as undesirable. By the sametoken, backporting any code in branch-2 will require an increasingamount of effort, which contributors to branch-2 may not be interestedin committing to.These questions affect us directly because, while we require a certainamount of stability, we also like to pull in new functionality thatwill be of value to our users. For example, our current 0.13 releaseis probably closer to 0.14 at this point. Given the lifespan of arelease, it is often more palatable to backport features and bugfixesthan to jump to a new version.
The good thing about this proposal is the opportunity to evaluate andclean up alot of the old code.
Thanks,
chris
On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin<ser...@hortonworks.com> wrote:
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.





Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 18, 2015 at 11:47
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.


Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 18, 2015 at 11:46
I think we need some path for deprecating old Hadoop versions, the same
way we deprecate old Java version support or old RDBMS version support.
At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases.
The branches are a graceful way to get rid of the legacy burden.

Alternatively, when sweeping changes are made, we can do what Hbase did
(which is not pretty imho), where 0.94 version had ~30 dot releases
because people cannot upgrade to 0.96 “singularity” release.


I posit that people who run Hadoop 1 and MR at this day and age (and more
so as time passes) are people who either don’t care about perf and new
features, only stability; so, stability-focused branch would be perfect to
support them.

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to