Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Alexander Pivovarov Fri, 22 May 2015 10:04:53 -0700

Looks like we discussing 3 options:

1. Support hadoop 1, 2 and 3 in master branch.


2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in branch-3

3. Support hadoop 2 and 3 in master

I DO not think option 2 is good solution because it is much more difficuilt
to manage 3 active prod branches rather than one master branch.

I think we should go with options 1 or 3.

+1 on Xuefu and Edward opinion
On May 22, 2015 9:09 AM, "Sergey Shelukhin" <ser...@hortonworks.com> wrote:

> I think branch-2 doesn’t need to be framed as particularly adventurous
> (other than due to general increase of the amount of work done in Hive by
> community).
> All the new features that normally go on trunk/master will go to branch-2.
> branch-2 is just trunk as it is now, in fact there will be no branch-2,
> just master :) The difference is the dropped functionality, not added one.
> So you shouldn’t lose stability if you retain the same process as now by
> just staying on versions off master.
>
> Perhaps, as is usually the case in Apache projects, developing features on
> older branches would be discouraged. Right now, all features usually go on
> trunk/master, and are then back ported as needed and practical; so you
> wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
> and not back port to master.
>
> On 15/5/22, 00:49, "Chris Drome" <cdr...@yahoo-inc.com.INVALID> wrote:
>
> >I understand the motivation and benefits of creating a branch-2 where
> >more disruptive work can go on without affecting branch-1. While not
> >necessarily against this approach, from Yahoo's standpoint, I do have
> >some questions (concerns).
> >Upgrading to a new version of Hive requires a significant commitment of
> >time and resources to stabilize and certify a build for deployment to our
> >clusters. Given the size of our clusters and scale of datasets, we have
> >to be particularly careful about adopting new functionality. However, at
> >the same time we are interested in new testing and making available new
> >features and functionality. That said, we would have to rely on branch-1
> >for the immediate future.
> >One concern is that branch-1 would be left to stagnate, at which point
> >there would be no option but for users to move to branch-2 as branch-1
> >would be effectively end-of-lifed. I'm not sure how long this would take,
> >but it would eventually happen as a direct result of the very reason for
> >creating branch-2.
> >A related concern is how disruptive the code changes will be in branch-2.
> >I imagine that changes in early in branch-2 will be easy to backport to
> >branch-1, while this effort will become more difficult, if not
> >impractical, as time goes. If the code bases diverge too much then this
> >could lead to more pressure for users of branch-1 to add features just to
> >branch-1, which has been mentioned as undesirable. By the same token,
> >backporting any code in branch-2 will require an increasing amount of
> >effort, which contributors to branch-2 may not be interested in
> >committing to.
> >These questions affect us directly because, while we require a certain
> >amount of stability, we also like to pull in new functionality that will
> >be of value to our users. For example, our current 0.13 release is
> >probably closer to 0.14 at this point. Given the lifespan of a release,
> >it is often more palatable to backport features and bugfixes than to jump
> >to a new version.
> >
> >The good thing about this proposal is the opportunity to evaluate and
> >clean up alot of the old code.
> >Thanks,
> >chris
> >
> >
> >
> >     On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin
> ><ser...@hortonworks.com> wrote:
> >
> >
> > Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but
> >some
> >people are set in their ways or have practical considerations and don’t
> >care for new shiny stuff.
> >
> >On 15/5/18, 11:46, "Sergey Shelukhin" <ser...@hortonworks.com> wrote:
> >
> >>I think we need some path for deprecating old Hadoop versions, the same
> >>way we deprecate old Java version support or old RDBMS version support.
> >>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> >>goes for stuff like MR; supporting it, esp. for perf work, becomes a
> >>burden, and it’s outdated with 2 alternatives, one of which has been
> >>around for 2 releases.
> >>The branches are a graceful way to get rid of the legacy burden.
> >>
> >>Alternatively, when sweeping changes are made, we can do what Hbase did
> >>(which is not pretty imho), where 0.94 version had ~30 dot releases
> >>because people cannot upgrade to 0.96 “singularity” release.
> >>
> >>
> >>I posit that people who run Hadoop 1 and MR at this day and age (and more
> >>so as time passes) are people who either don’t care about perf and new
> >>features, only stability; so, stability-focused branch would be perfect
> >>to
> >>support them.
> >>
> >>
> >>On 15/5/18, 10:04, "Edward Capriolo" <edlinuxg...@gmail.com> wrote:
> >>
> >>>Up until recently Hive supported numerous versions of Hadoop code base
> >>>with
> >>>a simple shim layer. I would rather we stick to the shim layer. I think
> >>>this was easily the best part about hive was that a single release
> >>>worked
> >>>well regardless of your hadoop version. It was also a key element to
> >>>hive's
> >>>success. I do not want to see us have multiple branches.
> >>>
> >>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com>
> >>>wrote:
> >>>
> >>>> Thanks for the explanation, Alan!
> >>>>
> >>>> While I have understood more on the proposal, I actually see more
> >>>>problems
> >>>> than the confusion of two lines of releases. Essentially, this
> >>>>proposal
> >>>> forces a user to make a hard choice between a stabler, legacy-aware
> >>>>release
> >>>> line and an adventurous, pioneering release line. And once the choice
> >>>>is
> >>>> made, there is no easy way back or forward.
> >>>>
> >>>> Here is my interpretation. Let's say we have two main branches as
> >>>> proposed. I develop a new feature which I think useful for both
> >>>>branches.
> >>>> So, I commit it to both branches. My feature requires additional
> >>>>schema
> >>>> support, so I provide upgrade scripts for both branches. The scripts
> >>>>are
> >>>> different because the two branches have already diverged in schema.
> >>>>
> >>>> Now the two branches evolve in a diverging fashion like this. This is
> >>>>all
> >>>> good as long as a user stays in his line. The moment the user
> >>>>considers
> >>>>a
> >>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
> >>>>Because
> >>>> there is no upgrade path from a release in branch-1 to a release in
> >>>> branch-2!
> >>>>
> >>>> If we want to provide an upgrade path, then there will be MxN paths,
> >>>>where
> >>>> M and N are the number of releases in the two branches, respectively.
> >>>>This
> >>>> is going to be next to a nightmare, not only for users, but also for
> >>>>us.
> >>>>
> >>>> Also, the proposal will require two sets of things that Hive provides:
> >>>> double documentation, double feature tracking, double build/test
> >>>> infrastructures, etc.
> >>>>
> >>>> This approach can also potentially cause the problem we saw in hadoop
> >>>> releases, where 0.23 release was greater than 1.0 release.
> >>>>
> >>>> To me, the problem we are trying to solve is deprecating old things
> >>>>such
> >>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
> >>>> however, we approached the problem in less favorable ways.
> >>>>
> >>>> First, it seemed we wanted to deprecate something just for the sake of
> >>>> deprecation, and it's not based on the rationale that supports the
> >>>>desire.
> >>>> Dev might write code that accidentally break hadoop-1 build. However,
> >>>>this
> >>>> is more a build infrastructure problem rather than the burden of
> >>>>supporting
> >>>> hadoop-1. If our build could catch it at precommit test, then I would
> >>>>think
> >>>> the accident can be well avoided. Most of the times, fixing the build
> >>>>is
> >>>> trivial. And we have already addressed the build infrastructure
> >>>>problem.
> >>>>
> >>>> Secondly, if we do have a strong reason to deprecate something, we
> >>>>should
> >>>> have a deprecation plan rather than declaring on the spot that the
> >>>>current
> >>>> release is the last one supporting X. I think Microsoft did a better
> >>>>job in
> >>>> terms production deprecation. For instance, they announced long before
> >>>>the
> >>>> last day desupporting Windows XP. In my opinion, we should have a
> >>>>similar
> >>>> vision, giving users, distributions enough time to adjust rather than
> >>>> shocking them with breaking news.
> >>>>
> >>>> In summary, I do see the need of deprecation in Hive, but I am afraid
> >>>>the
> >>>> way we take, including the proposal here, isn't going to nicely solve
> >>>>the
> >>>> problem. On the contrary, I foresee a spectrum of confusion,
> >>>>frustration,
> >>>> and burden for the user as well as for developers.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com>
> >>>>wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>  Xuefu Zhang <xzh...@cloudera.com>
> >>>>>  May 15, 2015 at 17:31
> >>>>>
> >>>>> Just make sure that I understand the proposal correctly: we are going
> >>>>>to
> >>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
> >>>>>
> >>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not
> >>>>>Hadoop.
> >>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
> >>>>>is
> >>>>> already well established.
> >>>>>
> >>>>>  New features
> >>>>> are only merged to branch-2. That essentially says we stop
> >>>>>development
> >>>>>for
> >>>>> hadoop-1, right?
> >>>>>
> >>>>>  If developers want to keep contributing patches to branch-1 then
> >>>>> there's no need for it to stop.  We would want to avoid putting new
> >>>>> features only on branch-1, unless they only made sense in that
> >>>>>context.
> >>>>> But I assume we'll see people contributing to branch-1 for some time.
> >>>>>
> >>>>>  Are we also making two lines of releases: ene for branch-1
> >>>>> and one for branch-2? Won't that be confusing and also burdensome if
> >>>>>we
> >>>>> release say 1.3, 2.0, 2.1, 1.4...
> >>>>>
> >>>>>  I'm asserting that it will be less confusing than the alternatives.
> >>>>>We
> >>>>> need some way to make early releases of many of the new features.  I
> >>>>> believe that this proposal is less confusing than if we start putting
> >>>>>the
> >>>>> new features in 1.x branches.  This is particularly true because it
> >>>>>would
> >>>>> help us to start being able to drop older functionality like Hadoop-1
> >>>>>and
> >>>>> MapReduce, which is very hard to do in the 1.x line without stranding
> >>>>>users.
> >>>>>
> >>>>>  Please note that we will have hadoop 3 soon. What's the story there?
> >>>>>
> >>>>>  As I said above, I don't see this as tied to Hadoop versions.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>  Thanks,
> >>>>> Xuefu
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
> >>>>><vgumas...@hortonworks.com
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>  +1 on the new branch. I think it’ll help in faster dev time for
> >>>>>these
> >>>>> important changes.
> >>>>>
> >>>>>  —Vaibhav
> >>>>>
> >>>>>  From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
> >>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
> >>>>><dev@hive.apache.org> <dev@hive.apache.org>
> >>>>> Date: Friday, May 15, 2015 at 4:11 PM
> >>>>> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org
> >
> >>>>><dev@hive.apache.org>
> >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
> >>>>>
> >>>>>  Anyone else have feedback on this?  If not I'll start a vote next
> >>>>>week.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>    Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org>
> >>>>> May 14, 2015 at 10:44
> >>>>>  Hi,
> >>>>>
> >>>>> +1 on the idea.
> >>>>>
> >>>>> Having a stable release branch with ongoing fixes where we do not
> >>>>>drop
> >>>>> major features would be good all around.
> >>>>>
> >>>>> It lets us accelerate the pace of development, drop major features or
> >>>>> rewrite them entirely without dragging everyone else kicking &
> >>>>>screaming
> >>>>> into that release.
> >>>>>
> >>>>> Cheers,
> >>>>> Gopal
> >>>>>
> >>>>>
> >>>>>
> >>>>>    Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com
> >
> >>>>> May 11, 2015 at 19:17
> >>>>>  That sounds like a good idea.
> >>>>> Some features could be back ported to branch-1 if viable, but at
> >>>>>least
> >>>>>new
> >>>>> stuff would not be burdened by Hadoop 1/MR code paths.
> >>>>> Probably also a good place to enable vectorization and other perf
> >>>>>features
> >>>>> by default while we make alpha releases.
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>>    Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
> >>>>> May 11, 2015 at 15:38
> >>>>>  There is a lot of forward-looking work going on in various branches
> >>>>>of
> >>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
> >>>>>would
> >>>>> be good to have a way to release this code to users so that they can
> >>>>> experiment with it.  Releasing it will also provide feedback to
> >>>>>developers.
> >>>>>
> >>>>> At the same time there are discussions on whether to keep supporting
> >>>>> Hadoop-1.  The burden of supporting older, less used functionality
> >>>>>such as
> >>>>> Hadoop-1 is becoming ever harder as many new features are added.
> >>>>>
> >>>>> I propose that the best way to deal with this would be to make a
> >>>>> branch-1.  We could continue to make new feature releases off of this
> >>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
> >>>>>functionality.
> >>>>> This provides stability and continuity for users and developers.
> >>>>>
> >>>>> We could then merge these new features branches (LLAP, HBase
> >>>>>metastore,
> >>>>> CLI drop) into the trunk, as well as turn on by default newer
> >>>>>features
> >>>>>such
> >>>>> as the vectorization and ACID.  We could also drop older, less used
> >>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
> >>>>>while
> >>>>> before we are ready to make stable, production ready releases of this
> >>>>> code.  But we could start making alpha quality releases soon.  We
> >>>>>would
> >>>>> call these releases 2.x, to stress the non-backward compatible
> >>>>>changes
> >>>>>such
> >>>>> as dropping Hadoop-1.  This will give users a chance to play with the
> >>>>>new
> >>>>> code and developers a chance to get feedback.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>>
> >>>>>
> >>>>>  Vaibhav Gumashta <vgumas...@hortonworks.com>
> >>>>>  May 15, 2015 at 16:43
> >>>>>  +1 on the new branch. I think it’ll help in faster dev time for
> >>>>>these
> >>>>> important changes.
> >>>>>
> >>>>>  —Vaibhav
> >>>>>
> >>>>>  From: Alan Gates <alanfga...@gmail.com>
> >>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
> >>>>> Date: Friday, May 15, 2015 at 4:11 PM
> >>>>> To: "dev@hive.apache.org" <dev@hive.apache.org>
> >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
> >>>>>
> >>>>>  Anyone else have feedback on this?  If not I'll start a vote next
> >>>>>week.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>    Gopal Vijayaraghavan <gop...@apache.org>
> >>>>>  May 14, 2015 at 10:44
> >>>>> Hi,
> >>>>>
> >>>>> +1 on the idea.
> >>>>>
> >>>>> Having a stable release branch with ongoing fixes where we do not
> >>>>>drop
> >>>>> major features would be good all around.
> >>>>>
> >>>>> It lets us accelerate the pace of development, drop major features or
> >>>>> rewrite them entirely without dragging everyone else kicking &
> >>>>>screaming
> >>>>> into that release.
> >>>>>
> >>>>> Cheers,
> >>>>> Gopal
> >>>>>
> >>>>>
> >>>>>
> >>>>>  Sergey Shelukhin <ser...@hortonworks.com>
> >>>>>  May 11, 2015 at 19:17
> >>>>> That sounds like a good idea.
> >>>>> Some features could be back ported to branch-1 if viable, but at
> >>>>>least
> >>>>>new
> >>>>> stuff would not be burdened by Hadoop 1/MR code paths.
> >>>>> Probably also a good place to enable vectorization and other perf
> >>>>>features
> >>>>> by default while we make alpha releases.
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>>  Alan Gates <alanfga...@gmail.com>
> >>>>>  May 11, 2015 at 15:38
> >>>>> There is a lot of forward-looking work going on in various branches
> >>>>>of
> >>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
> >>>>>would
> >>>>> be good to have a way to release this code to users so that they can
> >>>>> experiment with it.  Releasing it will also provide feedback to
> >>>>>developers.
> >>>>>
> >>>>> At the same time there are discussions on whether to keep supporting
> >>>>> Hadoop-1.  The burden of supporting older, less used functionality
> >>>>>such as
> >>>>> Hadoop-1 is becoming ever harder as many new features are added.
> >>>>>
> >>>>> I propose that the best way to deal with this would be to make a
> >>>>> branch-1.  We could continue to make new feature releases off of this
> >>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
> >>>>>functionality.
> >>>>> This provides stability and continuity for users and developers.
> >>>>>
> >>>>> We could then merge these new features branches (LLAP, HBase
> >>>>>metastore,
> >>>>> CLI drop) into the trunk, as well as turn on by default newer
> >>>>>features
> >>>>>such
> >>>>> as the vectorization and ACID.  We could also drop older, less used
> >>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
> >>>>>while
> >>>>> before we are ready to make stable, production ready releases of this
> >>>>> code.  But we could start making alpha quality releases soon.  We
> >>>>>would
> >>>>> call these releases 2.x, to stress the non-backward compatible
> >>>>>changes
> >>>>>such
> >>>>> as dropping Hadoop-1.  This will give users a chance to play with the
> >>>>>new
> >>>>> code and developers a chance to get feedback.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>>
> >>>>
> >>
> >
> >
> >
> >
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to