Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Edward Capriolo Mon, 18 May 2015 10:05:57 -0700

Up until recently Hive supported numerous versions of Hadoop code base with
a simple shim layer. I would rather we stick to the shim layer. I think
this was easily the best part about hive was that a single release worked
well regardless of your hadoop version. It was also a key element to hive's
success. I do not want to see us have multiple branches.


On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com> wrote:

> Thanks for the explanation, Alan!
>
> While I have understood more on the proposal, I actually see more problems
> than the confusion of two lines of releases. Essentially, this proposal
> forces a user to make a hard choice between a stabler, legacy-aware release
> line and an adventurous, pioneering release line. And once the choice is
> made, there is no easy way back or forward.
>
> Here is my interpretation. Let's say we have two main branches as
> proposed. I develop a new feature which I think useful for both branches.
> So, I commit it to both branches. My feature requires additional schema
> support, so I provide upgrade scripts for both branches. The scripts are
> different because the two branches have already diverged in schema.
>
> Now the two branches evolve in a diverging fashion like this. This is all
> good as long as a user stays in his line. The moment the user considers a
> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
> there is no upgrade path from a release in branch-1 to a release in
> branch-2!
>
> If we want to provide an upgrade path, then there will be MxN paths, where
> M and N are the number of releases in the two branches, respectively. This
> is going to be next to a nightmare, not only for users, but also for us.
>
> Also, the proposal will require two sets of things that Hive provides:
> double documentation, double feature tracking, double build/test
> infrastructures, etc.
>
> This approach can also potentially cause the problem we saw in hadoop
> releases, where 0.23 release was greater than 1.0 release.
>
> To me, the problem we are trying to solve is deprecating old things such
> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
> however, we approached the problem in less favorable ways.
>
> First, it seemed we wanted to deprecate something just for the sake of
> deprecation, and it's not based on the rationale that supports the desire.
> Dev might write code that accidentally break hadoop-1 build. However, this
> is more a build infrastructure problem rather than the burden of supporting
> hadoop-1. If our build could catch it at precommit test, then I would think
> the accident can be well avoided. Most of the times, fixing the build is
> trivial. And we have already addressed the build infrastructure problem.
>
> Secondly, if we do have a strong reason to deprecate something, we should
> have a deprecation plan rather than declaring on the spot that the current
> release is the last one supporting X. I think Microsoft did a better job in
> terms production deprecation. For instance, they announced long before the
> last day desupporting Windows XP. In my opinion, we should have a similar
> vision, giving users, distributions enough time to adjust rather than
> shocking them with breaking news.
>
> In summary, I do see the need of deprecation in Hive, but I am afraid the
> way we take, including the proposal here, isn't going to nicely solve the
> problem. On the contrary, I foresee a spectrum of confusion, frustration,
> and burden for the user as well as for developers.
>
> Thanks,
> Xuefu
>
> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com> wrote:
>
>>
>>
>>   Xuefu Zhang <xzh...@cloudera.com>
>>  May 15, 2015 at 17:31
>>
>> Just make sure that I understand the proposal correctly: we are going to
>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>
>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>> It will be some time before Hive's branch-2 is stable, while Hadoop-2 is
>> already well established.
>>
>>  New features
>> are only merged to branch-2. That essentially says we stop development for
>> hadoop-1, right?
>>
>>  If developers want to keep contributing patches to branch-1 then
>> there's no need for it to stop.  We would want to avoid putting new
>> features only on branch-1, unless they only made sense in that context.
>> But I assume we'll see people contributing to branch-1 for some time.
>>
>>  Are we also making two lines of releases: ene for branch-1
>> and one for branch-2? Won't that be confusing and also burdensome if we
>> release say 1.3, 2.0, 2.1, 1.4...
>>
>>  I'm asserting that it will be less confusing than the alternatives.  We
>> need some way to make early releases of many of the new features.  I
>> believe that this proposal is less confusing than if we start putting the
>> new features in 1.x branches.  This is particularly true because it would
>> help us to start being able to drop older functionality like Hadoop-1 and
>> MapReduce, which is very hard to do in the 1.x line without stranding users.
>>
>>  Please note that we will have hadoop 3 soon. What's the story there?
>>
>>  As I said above, I don't see this as tied to Hadoop versions.
>>
>> Alan.
>>
>>  Thanks,
>> Xuefu
>>
>>
>>
>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumas...@hortonworks.com
>>
>> wrote:
>>
>>  +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>  —Vaibhav
>>
>>   From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> 
>> <dev@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> 
>> <dev@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>    Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org>
>> May 14, 2015 at 10:44
>>   Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking & screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>    Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com>
>> May 11, 2015 at 19:17
>>   That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>    Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
>> May 11, 2015 at 15:38
>>   There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>>
>>   Vaibhav Gumashta <vgumas...@hortonworks.com>
>>  May 15, 2015 at 16:43
>>  +1 on the new branch. I think it’ll help in faster dev time for these
>> important changes.
>>
>>  —Vaibhav
>>
>>   From: Alan Gates <alanfga...@gmail.com>
>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
>> Date: Friday, May 15, 2015 at 4:11 PM
>> To: "dev@hive.apache.org" <dev@hive.apache.org>
>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>
>>  Anyone else have feedback on this?  If not I'll start a vote next week.
>>
>> Alan.
>>
>>     Gopal Vijayaraghavan <gop...@apache.org>
>>  May 14, 2015 at 10:44
>> Hi,
>>
>> +1 on the idea.
>>
>> Having a stable release branch with ongoing fixes where we do not drop
>> major features would be good all around.
>>
>> It lets us accelerate the pace of development, drop major features or
>> rewrite them entirely without dragging everyone else kicking & screaming
>> into that release.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>   Sergey Shelukhin <ser...@hortonworks.com>
>>  May 11, 2015 at 19:17
>> That sounds like a good idea.
>> Some features could be back ported to branch-1 if viable, but at least new
>> stuff would not be burdened by Hadoop 1/MR code paths.
>> Probably also a good place to enable vectorization and other perf features
>> by default while we make alpha releases.
>>
>> +1
>>
>>
>>   Alan Gates <alanfga...@gmail.com>
>>  May 11, 2015 at 15:38
>> There is a lot of forward-looking work going on in various branches of
>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
>> be good to have a way to release this code to users so that they can
>> experiment with it.  Releasing it will also provide feedback to developers.
>>
>> At the same time there are discussions on whether to keep supporting
>> Hadoop-1.  The burden of supporting older, less used functionality such as
>> Hadoop-1 is becoming ever harder as many new features are added.
>>
>> I propose that the best way to deal with this would be to make a
>> branch-1.  We could continue to make new feature releases off of this
>> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
>> This provides stability and continuity for users and developers.
>>
>> We could then merge these new features branches (LLAP, HBase metastore,
>> CLI drop) into the trunk, as well as turn on by default newer features such
>> as the vectorization and ACID.  We could also drop older, less used
>> features such as support for Hadoop-1 and MapReduce.  It will be a while
>> before we are ready to make stable, production ready releases of this
>> code.  But we could start making alpha quality releases soon.  We would
>> call these releases 2.x, to stress the non-backward compatible changes such
>> as dropping Hadoop-1.  This will give users a chance to play with the new
>> code and developers a chance to get feedback.
>>
>> Thoughts?
>>
>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to