Looks like we discussing 3 options: 1. Support hadoop 1, 2 and 3 in master branch.
2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in branch-3 3. Support hadoop 2 and 3 in master I DO not think option 2 is good solution because it is much more difficuilt to manage 3 active prod branches rather than one master branch. I think we should go with options 1 or 3. +1 on Xuefu and Edward opinion On May 22, 2015 9:09 AM, "Sergey Shelukhin" <ser...@hortonworks.com> wrote: > I think branch-2 doesn’t need to be framed as particularly adventurous > (other than due to general increase of the amount of work done in Hive by > community). > All the new features that normally go on trunk/master will go to branch-2. > branch-2 is just trunk as it is now, in fact there will be no branch-2, > just master :) The difference is the dropped functionality, not added one. > So you shouldn’t lose stability if you retain the same process as now by > just staying on versions off master. > > Perhaps, as is usually the case in Apache projects, developing features on > older branches would be discouraged. Right now, all features usually go on > trunk/master, and are then back ported as needed and practical; so you > wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N, > and not back port to master. > > On 15/5/22, 00:49, "Chris Drome" <cdr...@yahoo-inc.com.INVALID> wrote: > > >I understand the motivation and benefits of creating a branch-2 where > >more disruptive work can go on without affecting branch-1. While not > >necessarily against this approach, from Yahoo's standpoint, I do have > >some questions (concerns). > >Upgrading to a new version of Hive requires a significant commitment of > >time and resources to stabilize and certify a build for deployment to our > >clusters. Given the size of our clusters and scale of datasets, we have > >to be particularly careful about adopting new functionality. However, at > >the same time we are interested in new testing and making available new > >features and functionality. That said, we would have to rely on branch-1 > >for the immediate future. > >One concern is that branch-1 would be left to stagnate, at which point > >there would be no option but for users to move to branch-2 as branch-1 > >would be effectively end-of-lifed. I'm not sure how long this would take, > >but it would eventually happen as a direct result of the very reason for > >creating branch-2. > >A related concern is how disruptive the code changes will be in branch-2. > >I imagine that changes in early in branch-2 will be easy to backport to > >branch-1, while this effort will become more difficult, if not > >impractical, as time goes. If the code bases diverge too much then this > >could lead to more pressure for users of branch-1 to add features just to > >branch-1, which has been mentioned as undesirable. By the same token, > >backporting any code in branch-2 will require an increasing amount of > >effort, which contributors to branch-2 may not be interested in > >committing to. > >These questions affect us directly because, while we require a certain > >amount of stability, we also like to pull in new functionality that will > >be of value to our users. For example, our current 0.13 release is > >probably closer to 0.14 at this point. Given the lifespan of a release, > >it is often more palatable to backport features and bugfixes than to jump > >to a new version. > > > >The good thing about this proposal is the opportunity to evaluate and > >clean up alot of the old code. > >Thanks, > >chris > > > > > > > > On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin > ><ser...@hortonworks.com> wrote: > > > > > > Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but > >some > >people are set in their ways or have practical considerations and don’t > >care for new shiny stuff. > > > >On 15/5/18, 11:46, "Sergey Shelukhin" <ser...@hortonworks.com> wrote: > > > >>I think we need some path for deprecating old Hadoop versions, the same > >>way we deprecate old Java version support or old RDBMS version support. > >>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same > >>goes for stuff like MR; supporting it, esp. for perf work, becomes a > >>burden, and it’s outdated with 2 alternatives, one of which has been > >>around for 2 releases. > >>The branches are a graceful way to get rid of the legacy burden. > >> > >>Alternatively, when sweeping changes are made, we can do what Hbase did > >>(which is not pretty imho), where 0.94 version had ~30 dot releases > >>because people cannot upgrade to 0.96 “singularity” release. > >> > >> > >>I posit that people who run Hadoop 1 and MR at this day and age (and more > >>so as time passes) are people who either don’t care about perf and new > >>features, only stability; so, stability-focused branch would be perfect > >>to > >>support them. > >> > >> > >>On 15/5/18, 10:04, "Edward Capriolo" <edlinuxg...@gmail.com> wrote: > >> > >>>Up until recently Hive supported numerous versions of Hadoop code base > >>>with > >>>a simple shim layer. I would rather we stick to the shim layer. I think > >>>this was easily the best part about hive was that a single release > >>>worked > >>>well regardless of your hadoop version. It was also a key element to > >>>hive's > >>>success. I do not want to see us have multiple branches. > >>> > >>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com> > >>>wrote: > >>> > >>>> Thanks for the explanation, Alan! > >>>> > >>>> While I have understood more on the proposal, I actually see more > >>>>problems > >>>> than the confusion of two lines of releases. Essentially, this > >>>>proposal > >>>> forces a user to make a hard choice between a stabler, legacy-aware > >>>>release > >>>> line and an adventurous, pioneering release line. And once the choice > >>>>is > >>>> made, there is no easy way back or forward. > >>>> > >>>> Here is my interpretation. Let's say we have two main branches as > >>>> proposed. I develop a new feature which I think useful for both > >>>>branches. > >>>> So, I commit it to both branches. My feature requires additional > >>>>schema > >>>> support, so I provide upgrade scripts for both branches. The scripts > >>>>are > >>>> different because the two branches have already diverged in schema. > >>>> > >>>> Now the two branches evolve in a diverging fashion like this. This is > >>>>all > >>>> good as long as a user stays in his line. The moment the user > >>>>considers > >>>>a > >>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? > >>>>Because > >>>> there is no upgrade path from a release in branch-1 to a release in > >>>> branch-2! > >>>> > >>>> If we want to provide an upgrade path, then there will be MxN paths, > >>>>where > >>>> M and N are the number of releases in the two branches, respectively. > >>>>This > >>>> is going to be next to a nightmare, not only for users, but also for > >>>>us. > >>>> > >>>> Also, the proposal will require two sets of things that Hive provides: > >>>> double documentation, double feature tracking, double build/test > >>>> infrastructures, etc. > >>>> > >>>> This approach can also potentially cause the problem we saw in hadoop > >>>> releases, where 0.23 release was greater than 1.0 release. > >>>> > >>>> To me, the problem we are trying to solve is deprecating old things > >>>>such > >>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, > >>>> however, we approached the problem in less favorable ways. > >>>> > >>>> First, it seemed we wanted to deprecate something just for the sake of > >>>> deprecation, and it's not based on the rationale that supports the > >>>>desire. > >>>> Dev might write code that accidentally break hadoop-1 build. However, > >>>>this > >>>> is more a build infrastructure problem rather than the burden of > >>>>supporting > >>>> hadoop-1. If our build could catch it at precommit test, then I would > >>>>think > >>>> the accident can be well avoided. Most of the times, fixing the build > >>>>is > >>>> trivial. And we have already addressed the build infrastructure > >>>>problem. > >>>> > >>>> Secondly, if we do have a strong reason to deprecate something, we > >>>>should > >>>> have a deprecation plan rather than declaring on the spot that the > >>>>current > >>>> release is the last one supporting X. I think Microsoft did a better > >>>>job in > >>>> terms production deprecation. For instance, they announced long before > >>>>the > >>>> last day desupporting Windows XP. In my opinion, we should have a > >>>>similar > >>>> vision, giving users, distributions enough time to adjust rather than > >>>> shocking them with breaking news. > >>>> > >>>> In summary, I do see the need of deprecation in Hive, but I am afraid > >>>>the > >>>> way we take, including the proposal here, isn't going to nicely solve > >>>>the > >>>> problem. On the contrary, I foresee a spectrum of confusion, > >>>>frustration, > >>>> and burden for the user as well as for developers. > >>>> > >>>> Thanks, > >>>> Xuefu > >>>> > >>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com> > >>>>wrote: > >>>> > >>>>> > >>>>> > >>>>> Xuefu Zhang <xzh...@cloudera.com> > >>>>> May 15, 2015 at 17:31 > >>>>> > >>>>> Just make sure that I understand the proposal correctly: we are going > >>>>>to > >>>>> have two main branches, one for hadoop-1 and one for hadoop-2. > >>>>> > >>>>> We shouldn't tie this to hadoop-1 and 2. It's about Hive not > >>>>>Hadoop. > >>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2 > >>>>>is > >>>>> already well established. > >>>>> > >>>>> New features > >>>>> are only merged to branch-2. That essentially says we stop > >>>>>development > >>>>>for > >>>>> hadoop-1, right? > >>>>> > >>>>> If developers want to keep contributing patches to branch-1 then > >>>>> there's no need for it to stop. We would want to avoid putting new > >>>>> features only on branch-1, unless they only made sense in that > >>>>>context. > >>>>> But I assume we'll see people contributing to branch-1 for some time. > >>>>> > >>>>> Are we also making two lines of releases: ene for branch-1 > >>>>> and one for branch-2? Won't that be confusing and also burdensome if > >>>>>we > >>>>> release say 1.3, 2.0, 2.1, 1.4... > >>>>> > >>>>> I'm asserting that it will be less confusing than the alternatives. > >>>>>We > >>>>> need some way to make early releases of many of the new features. I > >>>>> believe that this proposal is less confusing than if we start putting > >>>>>the > >>>>> new features in 1.x branches. This is particularly true because it > >>>>>would > >>>>> help us to start being able to drop older functionality like Hadoop-1 > >>>>>and > >>>>> MapReduce, which is very hard to do in the 1.x line without stranding > >>>>>users. > >>>>> > >>>>> Please note that we will have hadoop 3 soon. What's the story there? > >>>>> > >>>>> As I said above, I don't see this as tied to Hadoop versions. > >>>>> > >>>>> Alan. > >>>>> > >>>>> Thanks, > >>>>> Xuefu > >>>>> > >>>>> > >>>>> > >>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta > >>>>><vgumas...@hortonworks.com > >>>>> > >>>>> wrote: > >>>>> > >>>>> +1 on the new branch. I think it’ll help in faster dev time for > >>>>>these > >>>>> important changes. > >>>>> > >>>>> —Vaibhav > >>>>> > >>>>> From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> > >>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> > >>>>><dev@hive.apache.org> <dev@hive.apache.org> > >>>>> Date: Friday, May 15, 2015 at 4:11 PM > >>>>> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org > > > >>>>><dev@hive.apache.org> > >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features > >>>>> > >>>>> Anyone else have feedback on this? If not I'll start a vote next > >>>>>week. > >>>>> > >>>>> Alan. > >>>>> > >>>>> Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org> > >>>>> May 14, 2015 at 10:44 > >>>>> Hi, > >>>>> > >>>>> +1 on the idea. > >>>>> > >>>>> Having a stable release branch with ongoing fixes where we do not > >>>>>drop > >>>>> major features would be good all around. > >>>>> > >>>>> It lets us accelerate the pace of development, drop major features or > >>>>> rewrite them entirely without dragging everyone else kicking & > >>>>>screaming > >>>>> into that release. > >>>>> > >>>>> Cheers, > >>>>> Gopal > >>>>> > >>>>> > >>>>> > >>>>> Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com > > > >>>>> May 11, 2015 at 19:17 > >>>>> That sounds like a good idea. > >>>>> Some features could be back ported to branch-1 if viable, but at > >>>>>least > >>>>>new > >>>>> stuff would not be burdened by Hadoop 1/MR code paths. > >>>>> Probably also a good place to enable vectorization and other perf > >>>>>features > >>>>> by default while we make alpha releases. > >>>>> > >>>>> +1 > >>>>> > >>>>> > >>>>> Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> > >>>>> May 11, 2015 at 15:38 > >>>>> There is a lot of forward-looking work going on in various branches > >>>>>of > >>>>> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It > >>>>>would > >>>>> be good to have a way to release this code to users so that they can > >>>>> experiment with it. Releasing it will also provide feedback to > >>>>>developers. > >>>>> > >>>>> At the same time there are discussions on whether to keep supporting > >>>>> Hadoop-1. The burden of supporting older, less used functionality > >>>>>such as > >>>>> Hadoop-1 is becoming ever harder as many new features are added. > >>>>> > >>>>> I propose that the best way to deal with this would be to make a > >>>>> branch-1. We could continue to make new feature releases off of this > >>>>> branch (1.3, 1.4, etc.). This branch would not drop old > >>>>>functionality. > >>>>> This provides stability and continuity for users and developers. > >>>>> > >>>>> We could then merge these new features branches (LLAP, HBase > >>>>>metastore, > >>>>> CLI drop) into the trunk, as well as turn on by default newer > >>>>>features > >>>>>such > >>>>> as the vectorization and ACID. We could also drop older, less used > >>>>> features such as support for Hadoop-1 and MapReduce. It will be a > >>>>>while > >>>>> before we are ready to make stable, production ready releases of this > >>>>> code. But we could start making alpha quality releases soon. We > >>>>>would > >>>>> call these releases 2.x, to stress the non-backward compatible > >>>>>changes > >>>>>such > >>>>> as dropping Hadoop-1. This will give users a chance to play with the > >>>>>new > >>>>> code and developers a chance to get feedback. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> > >>>>> > >>>>> Vaibhav Gumashta <vgumas...@hortonworks.com> > >>>>> May 15, 2015 at 16:43 > >>>>> +1 on the new branch. I think it’ll help in faster dev time for > >>>>>these > >>>>> important changes. > >>>>> > >>>>> —Vaibhav > >>>>> > >>>>> From: Alan Gates <alanfga...@gmail.com> > >>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> > >>>>> Date: Friday, May 15, 2015 at 4:11 PM > >>>>> To: "dev@hive.apache.org" <dev@hive.apache.org> > >>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features > >>>>> > >>>>> Anyone else have feedback on this? If not I'll start a vote next > >>>>>week. > >>>>> > >>>>> Alan. > >>>>> > >>>>> Gopal Vijayaraghavan <gop...@apache.org> > >>>>> May 14, 2015 at 10:44 > >>>>> Hi, > >>>>> > >>>>> +1 on the idea. > >>>>> > >>>>> Having a stable release branch with ongoing fixes where we do not > >>>>>drop > >>>>> major features would be good all around. > >>>>> > >>>>> It lets us accelerate the pace of development, drop major features or > >>>>> rewrite them entirely without dragging everyone else kicking & > >>>>>screaming > >>>>> into that release. > >>>>> > >>>>> Cheers, > >>>>> Gopal > >>>>> > >>>>> > >>>>> > >>>>> Sergey Shelukhin <ser...@hortonworks.com> > >>>>> May 11, 2015 at 19:17 > >>>>> That sounds like a good idea. > >>>>> Some features could be back ported to branch-1 if viable, but at > >>>>>least > >>>>>new > >>>>> stuff would not be burdened by Hadoop 1/MR code paths. > >>>>> Probably also a good place to enable vectorization and other perf > >>>>>features > >>>>> by default while we make alpha releases. > >>>>> > >>>>> +1 > >>>>> > >>>>> > >>>>> Alan Gates <alanfga...@gmail.com> > >>>>> May 11, 2015 at 15:38 > >>>>> There is a lot of forward-looking work going on in various branches > >>>>>of > >>>>> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It > >>>>>would > >>>>> be good to have a way to release this code to users so that they can > >>>>> experiment with it. Releasing it will also provide feedback to > >>>>>developers. > >>>>> > >>>>> At the same time there are discussions on whether to keep supporting > >>>>> Hadoop-1. The burden of supporting older, less used functionality > >>>>>such as > >>>>> Hadoop-1 is becoming ever harder as many new features are added. > >>>>> > >>>>> I propose that the best way to deal with this would be to make a > >>>>> branch-1. We could continue to make new feature releases off of this > >>>>> branch (1.3, 1.4, etc.). This branch would not drop old > >>>>>functionality. > >>>>> This provides stability and continuity for users and developers. > >>>>> > >>>>> We could then merge these new features branches (LLAP, HBase > >>>>>metastore, > >>>>> CLI drop) into the trunk, as well as turn on by default newer > >>>>>features > >>>>>such > >>>>> as the vectorization and ACID. We could also drop older, less used > >>>>> features such as support for Hadoop-1 and MapReduce. It will be a > >>>>>while > >>>>> before we are ready to make stable, production ready releases of this > >>>>> code. But we could start making alpha quality releases soon. We > >>>>>would > >>>>> call these releases 2.x, to stress the non-backward compatible > >>>>>changes > >>>>>such > >>>>> as dropping Hadoop-1. This will give users a chance to play with the > >>>>>new > >>>>> code and developers a chance to get feedback. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> > >>>> > >> > > > > > > > > > >