Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches.
On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com> wrote: > Thanks for the explanation, Alan! > > While I have understood more on the proposal, I actually see more problems > than the confusion of two lines of releases. Essentially, this proposal > forces a user to make a hard choice between a stabler, legacy-aware release > line and an adventurous, pioneering release line. And once the choice is > made, there is no easy way back or forward. > > Here is my interpretation. Let's say we have two main branches as > proposed. I develop a new feature which I think useful for both branches. > So, I commit it to both branches. My feature requires additional schema > support, so I provide upgrade scripts for both branches. The scripts are > different because the two branches have already diverged in schema. > > Now the two branches evolve in a diverging fashion like this. This is all > good as long as a user stays in his line. The moment the user considers a > switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because > there is no upgrade path from a release in branch-1 to a release in > branch-2! > > If we want to provide an upgrade path, then there will be MxN paths, where > M and N are the number of releases in the two branches, respectively. This > is going to be next to a nightmare, not only for users, but also for us. > > Also, the proposal will require two sets of things that Hive provides: > double documentation, double feature tracking, double build/test > infrastructures, etc. > > This approach can also potentially cause the problem we saw in hadoop > releases, where 0.23 release was greater than 1.0 release. > > To me, the problem we are trying to solve is deprecating old things such > hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, > however, we approached the problem in less favorable ways. > > First, it seemed we wanted to deprecate something just for the sake of > deprecation, and it's not based on the rationale that supports the desire. > Dev might write code that accidentally break hadoop-1 build. However, this > is more a build infrastructure problem rather than the burden of supporting > hadoop-1. If our build could catch it at precommit test, then I would think > the accident can be well avoided. Most of the times, fixing the build is > trivial. And we have already addressed the build infrastructure problem. > > Secondly, if we do have a strong reason to deprecate something, we should > have a deprecation plan rather than declaring on the spot that the current > release is the last one supporting X. I think Microsoft did a better job in > terms production deprecation. For instance, they announced long before the > last day desupporting Windows XP. In my opinion, we should have a similar > vision, giving users, distributions enough time to adjust rather than > shocking them with breaking news. > > In summary, I do see the need of deprecation in Hive, but I am afraid the > way we take, including the proposal here, isn't going to nicely solve the > problem. On the contrary, I foresee a spectrum of confusion, frustration, > and burden for the user as well as for developers. > > Thanks, > Xuefu > > On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com> wrote: > >> >> >> Xuefu Zhang <xzh...@cloudera.com> >> May 15, 2015 at 17:31 >> >> Just make sure that I understand the proposal correctly: we are going to >> have two main branches, one for hadoop-1 and one for hadoop-2. >> >> We shouldn't tie this to hadoop-1 and 2. It's about Hive not Hadoop. >> It will be some time before Hive's branch-2 is stable, while Hadoop-2 is >> already well established. >> >> New features >> are only merged to branch-2. That essentially says we stop development for >> hadoop-1, right? >> >> If developers want to keep contributing patches to branch-1 then >> there's no need for it to stop. We would want to avoid putting new >> features only on branch-1, unless they only made sense in that context. >> But I assume we'll see people contributing to branch-1 for some time. >> >> Are we also making two lines of releases: ene for branch-1 >> and one for branch-2? Won't that be confusing and also burdensome if we >> release say 1.3, 2.0, 2.1, 1.4... >> >> I'm asserting that it will be less confusing than the alternatives. We >> need some way to make early releases of many of the new features. I >> believe that this proposal is less confusing than if we start putting the >> new features in 1.x branches. This is particularly true because it would >> help us to start being able to drop older functionality like Hadoop-1 and >> MapReduce, which is very hard to do in the 1.x line without stranding users. >> >> Please note that we will have hadoop 3 soon. What's the story there? >> >> As I said above, I don't see this as tied to Hadoop versions. >> >> Alan. >> >> Thanks, >> Xuefu >> >> >> >> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumas...@hortonworks.com >> >> wrote: >> >> +1 on the new branch. I think it’ll help in faster dev time for these >> important changes. >> >> —Vaibhav >> >> From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> >> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> >> <dev@hive.apache.org> >> Date: Friday, May 15, 2015 at 4:11 PM >> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> >> <dev@hive.apache.org> >> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features >> >> Anyone else have feedback on this? If not I'll start a vote next week. >> >> Alan. >> >> Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org> >> May 14, 2015 at 10:44 >> Hi, >> >> +1 on the idea. >> >> Having a stable release branch with ongoing fixes where we do not drop >> major features would be good all around. >> >> It lets us accelerate the pace of development, drop major features or >> rewrite them entirely without dragging everyone else kicking & screaming >> into that release. >> >> Cheers, >> Gopal >> >> >> >> Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com> >> May 11, 2015 at 19:17 >> That sounds like a good idea. >> Some features could be back ported to branch-1 if viable, but at least new >> stuff would not be burdened by Hadoop 1/MR code paths. >> Probably also a good place to enable vectorization and other perf features >> by default while we make alpha releases. >> >> +1 >> >> >> Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> >> May 11, 2015 at 15:38 >> There is a lot of forward-looking work going on in various branches of >> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It would >> be good to have a way to release this code to users so that they can >> experiment with it. Releasing it will also provide feedback to developers. >> >> At the same time there are discussions on whether to keep supporting >> Hadoop-1. The burden of supporting older, less used functionality such as >> Hadoop-1 is becoming ever harder as many new features are added. >> >> I propose that the best way to deal with this would be to make a >> branch-1. We could continue to make new feature releases off of this >> branch (1.3, 1.4, etc.). This branch would not drop old functionality. >> This provides stability and continuity for users and developers. >> >> We could then merge these new features branches (LLAP, HBase metastore, >> CLI drop) into the trunk, as well as turn on by default newer features such >> as the vectorization and ACID. We could also drop older, less used >> features such as support for Hadoop-1 and MapReduce. It will be a while >> before we are ready to make stable, production ready releases of this >> code. But we could start making alpha quality releases soon. We would >> call these releases 2.x, to stress the non-backward compatible changes such >> as dropping Hadoop-1. This will give users a chance to play with the new >> code and developers a chance to get feedback. >> >> Thoughts? >> >> >> >> Vaibhav Gumashta <vgumas...@hortonworks.com> >> May 15, 2015 at 16:43 >> +1 on the new branch. I think it’ll help in faster dev time for these >> important changes. >> >> —Vaibhav >> >> From: Alan Gates <alanfga...@gmail.com> >> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> >> Date: Friday, May 15, 2015 at 4:11 PM >> To: "dev@hive.apache.org" <dev@hive.apache.org> >> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features >> >> Anyone else have feedback on this? If not I'll start a vote next week. >> >> Alan. >> >> Gopal Vijayaraghavan <gop...@apache.org> >> May 14, 2015 at 10:44 >> Hi, >> >> +1 on the idea. >> >> Having a stable release branch with ongoing fixes where we do not drop >> major features would be good all around. >> >> It lets us accelerate the pace of development, drop major features or >> rewrite them entirely without dragging everyone else kicking & screaming >> into that release. >> >> Cheers, >> Gopal >> >> >> >> Sergey Shelukhin <ser...@hortonworks.com> >> May 11, 2015 at 19:17 >> That sounds like a good idea. >> Some features could be back ported to branch-1 if viable, but at least new >> stuff would not be burdened by Hadoop 1/MR code paths. >> Probably also a good place to enable vectorization and other perf features >> by default while we make alpha releases. >> >> +1 >> >> >> Alan Gates <alanfga...@gmail.com> >> May 11, 2015 at 15:38 >> There is a lot of forward-looking work going on in various branches of >> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It would >> be good to have a way to release this code to users so that they can >> experiment with it. Releasing it will also provide feedback to developers. >> >> At the same time there are discussions on whether to keep supporting >> Hadoop-1. The burden of supporting older, less used functionality such as >> Hadoop-1 is becoming ever harder as many new features are added. >> >> I propose that the best way to deal with this would be to make a >> branch-1. We could continue to make new feature releases off of this >> branch (1.3, 1.4, etc.). This branch would not drop old functionality. >> This provides stability and continuity for users and developers. >> >> We could then merge these new features branches (LLAP, HBase metastore, >> CLI drop) into the trunk, as well as turn on by default newer features such >> as the vectorization and ACID. We could also drop older, less used >> features such as support for Hadoop-1 and MapReduce. It will be a while >> before we are ready to make stable, production ready releases of this >> code. But we could start making alpha quality releases soon. We would >> call these releases 2.x, to stress the non-backward compatible changes such >> as dropping Hadoop-1. This will give users a chance to play with the new >> code and developers a chance to get feedback. >> >> Thoughts? >> >> >