I don't think anyone is advocating for option 2, as that would be
disastrous. Option 3 is closest to what I'm proposing, though again
dropping support for Hadoop 1 is only a part of it.
Alan.
Alexander Pivovarov <mailto:apivova...@gmail.com>
May 22, 2015 at 10:03
Looks like we discussing 3 options:
1. Support hadoop 1, 2 and 3 in master branch.
2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in
branch-3
3. Support hadoop 2 and 3 in master
I DO not think option 2 is good solution because it is much more
difficuilt
to manage 3 active prod branches rather than one master branch.
I think we should go with options 1 or 3.
+1 on Xuefu and Edward opinion
Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 22, 2015 at 9:08
I think branch-2 doesn’t need to be framed as particularly adventurous
(other than due to general increase of the amount of work done in Hive by
community).
All the new features that normally go on trunk/master will go to branch-2.
branch-2 is just trunk as it is now, in fact there will be no branch-2,
just master :) The difference is the dropped functionality, not added one.
So you shouldn’t lose stability if you retain the same process as now by
just staying on versions off master.
Perhaps, as is usually the case in Apache projects, developing features on
older branches would be discouraged. Right now, all features usually go on
trunk/master, and are then back ported as needed and practical; so you
wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
and not back port to master.
Chris Drome <mailto:cdr...@yahoo-inc.com.INVALID>
May 22, 2015 at 0:49
I understand the motivation and benefits of creating a branch-2 where
more disruptive work can go on without affecting branch-1. While not
necessarily against this approach, from Yahoo's standpoint, I do have
some questions (concerns).
Upgrading to a new version of Hive requires a significant commitment
of time and resources to stabilize and certify a build for deployment
to our clusters. Given the size of our clusters and scale of datasets,
we have to be particularly careful about adopting new functionality.
However, at the same time we are interested in new testing and making
available new features and functionality. That said, we would have to
rely on branch-1 for the immediate future.
One concern is that branch-1 would be left to stagnate, at which point
there would be no option but for users to move to branch-2 as branch-1
would be effectively end-of-lifed. I'm not sure how long this would
take, but it would eventually happen as a direct result of the very
reason for creating branch-2.
A related concern is how disruptive the code changes will be in
branch-2. I imagine that changes in early in branch-2 will be easy to
backport to branch-1, while this effort will become more difficult, if
not impractical, as time goes. If the code bases diverge too much then
this could lead to more pressure for users of branch-1 to add features
just to branch-1, which has been mentioned as undesirable. By the same
token, backporting any code in branch-2 will require an increasing
amount of effort, which contributors to branch-2 may not be interested
in committing to.
These questions affect us directly because, while we require a certain
amount of stability, we also like to pull in new functionality that
will be of value to our users. For example, our current 0.13 release
is probably closer to 0.14 at this point. Given the lifespan of a
release, it is often more palatable to backport features and bugfixes
than to jump to a new version.
The good thing about this proposal is the opportunity to evaluate and
clean up alot of the old code.
Thanks,
chris
On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
<ser...@hortonworks.com> wrote:
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.
Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 18, 2015 at 11:47
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.
Sergey Shelukhin <mailto:ser...@hortonworks.com>
May 18, 2015 at 11:46
I think we need some path for deprecating old Hadoop versions, the same
way we deprecate old Java version support or old RDBMS version support.
At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases.
The branches are a graceful way to get rid of the legacy burden.
Alternatively, when sweeping changes are made, we can do what Hbase did
(which is not pretty imho), where 0.94 version had ~30 dot releases
because people cannot upgrade to 0.96 “singularity” release.
I posit that people who run Hadoop 1 and MR at this day and age (and more
so as time passes) are people who either don’t care about perf and new
features, only stability; so, stability-focused branch would be perfect to
support them.