Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Xuefu Zhang Fri, 15 May 2015 22:30:48 -0700

Thanks for the explanation, Alan!

While I have understood more on the proposal, I actually see more problems
than the confusion of two lines of releases. Essentially, this proposal
forces a user to make a hard choice between a stabler, legacy-aware release
line and an adventurous, pioneering release line. And once the choice is
made, there is no easy way back or forward.


Here is my interpretation. Let's say we have two main branches as proposed.
I develop a new feature which I think useful for both branches. So, I
commit it to both branches. My feature requires additional schema support,
so I provide upgrade scripts for both branches. The scripts are different
because the two branches have already diverged in schema.

Now the two branches evolve in a diverging fashion like this. This is all
good as long as a user stays in his line. The moment the user considers a
switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because
there is no upgrade path from a release in branch-1 to a release in
branch-2!

If we want to provide an upgrade path, then there will be MxN paths, where
M and N are the number of releases in the two branches, respectively. This
is going to be next to a nightmare, not only for users, but also for us.

Also, the proposal will require two sets of things that Hive provides:
double documentation, double feature tracking, double build/test
infrastructures, etc.

This approach can also potentially cause the problem we saw in hadoop
releases, where 0.23 release was greater than 1.0 release.

To me, the problem we are trying to solve is deprecating old things such
hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
however, we approached the problem in less favorable ways.

First, it seemed we wanted to deprecate something just for the sake of
deprecation, and it's not based on the rationale that supports the desire.
Dev might write code that accidentally break hadoop-1 build. However, this
is more a build infrastructure problem rather than the burden of supporting
hadoop-1. If our build could catch it at precommit test, then I would think
the accident can be well avoided. Most of the times, fixing the build is
trivial. And we have already addressed the build infrastructure problem.

Secondly, if we do have a strong reason to deprecate something, we should
have a deprecation plan rather than declaring on the spot that the current
release is the last one supporting X. I think Microsoft did a better job in
terms production deprecation. For instance, they announced long before the
last day desupporting Windows XP. In my opinion, we should have a similar
vision, giving users, distributions enough time to adjust rather than
shocking them with breaking news.

In summary, I do see the need of deprecation in Hive, but I am afraid the
way we take, including the proposal here, isn't going to nicely solve the
problem. On the contrary, I foresee a spectrum of confusion, frustration,
and burden for the user as well as for developers.

Thanks,
Xuefu

On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com> wrote:

>
>
>   Xuefu Zhang <xzh...@cloudera.com>
>  May 15, 2015 at 17:31
>
> Just make sure that I understand the proposal correctly: we are going to
> have two main branches, one for hadoop-1 and one for hadoop-2.
>
>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.  It
> will be some time before Hive's branch-2 is stable, while Hadoop-2 is
> already well established.
>
>  New features
> are only merged to branch-2. That essentially says we stop development for
> hadoop-1, right?
>
>  If developers want to keep contributing patches to branch-1 then there's
> no need for it to stop.  We would want to avoid putting new features only
> on branch-1, unless they only made sense in that context.  But I assume
> we'll see people contributing to branch-1 for some time.
>
>  Are we also making two lines of releases: ene for branch-1
> and one for branch-2? Won't that be confusing and also burdensome if we
> release say 1.3, 2.0, 2.1, 1.4...
>
>  I'm asserting that it will be less confusing than the alternatives.  We
> need some way to make early releases of many of the new features.  I
> believe that this proposal is less confusing than if we start putting the
> new features in 1.x branches.  This is particularly true because it would
> help us to start being able to drop older functionality like Hadoop-1 and
> MapReduce, which is very hard to do in the 1.x line without stranding users.
>
>  Please note that we will have hadoop 3 soon. What's the story there?
>
>  As I said above, I don't see this as tied to Hadoop versions.
>
> Alan.
>
>
> Thanks,
> Xuefu
>
>
>
> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta <vgumas...@hortonworks.com
>
> wrote:
>
>  +1 on the new branch. I think it’ll help in faster dev time for these
> important changes.
>
>  —Vaibhav
>
>   From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> 
> <dev@hive.apache.org>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> 
> <dev@hive.apache.org>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
>  Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
>    Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org>
> May 14, 2015 at 10:44
>   Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
>    Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com>
> May 11, 2015 at 19:17
>   That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
>    Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
> May 11, 2015 at 15:38
>   There is a lot of forward-looking work going on in various branches of
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
> be good to have a way to release this code to users so that they can
> experiment with it.  Releasing it will also provide feedback to developers.
>
> At the same time there are discussions on whether to keep supporting
> Hadoop-1.  The burden of supporting older, less used functionality such as
> Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a
> branch-1.  We could continue to make new feature releases off of this
> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
> This provides stability and continuity for users and developers.
>
> We could then merge these new features branches (LLAP, HBase metastore,
> CLI drop) into the trunk, as well as turn on by default newer features such
> as the vectorization and ACID.  We could also drop older, less used
> features such as support for Hadoop-1 and MapReduce.  It will be a while
> before we are ready to make stable, production ready releases of this
> code.  But we could start making alpha quality releases soon.  We would
> call these releases 2.x, to stress the non-backward compatible changes such
> as dropping Hadoop-1.  This will give users a chance to play with the new
> code and developers a chance to get feedback.
>
> Thoughts?
>
>
>
>   Vaibhav Gumashta <vgumas...@hortonworks.com>
>  May 15, 2015 at 16:43
>  +1 on the new branch. I think it’ll help in faster dev time for these
> important changes.
>
>  —Vaibhav
>
>   From: Alan Gates <alanfga...@gmail.com>
> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
> Date: Friday, May 15, 2015 at 4:11 PM
> To: "dev@hive.apache.org" <dev@hive.apache.org>
> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>
>  Anyone else have feedback on this?  If not I'll start a vote next week.
>
> Alan.
>
>     Gopal Vijayaraghavan <gop...@apache.org>
>  May 14, 2015 at 10:44
> Hi,
>
> +1 on the idea.
>
> Having a stable release branch with ongoing fixes where we do not drop
> major features would be good all around.
>
> It lets us accelerate the pace of development, drop major features or
> rewrite them entirely without dragging everyone else kicking & screaming
> into that release.
>
> Cheers,
> Gopal
>
>
>
>   Sergey Shelukhin <ser...@hortonworks.com>
>  May 11, 2015 at 19:17
> That sounds like a good idea.
> Some features could be back ported to branch-1 if viable, but at least new
> stuff would not be burdened by Hadoop 1/MR code paths.
> Probably also a good place to enable vectorization and other perf features
> by default while we make alpha releases.
>
> +1
>
>
>   Alan Gates <alanfga...@gmail.com>
>  May 11, 2015 at 15:38
> There is a lot of forward-looking work going on in various branches of
> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
> be good to have a way to release this code to users so that they can
> experiment with it.  Releasing it will also provide feedback to developers.
>
> At the same time there are discussions on whether to keep supporting
> Hadoop-1.  The burden of supporting older, less used functionality such as
> Hadoop-1 is becoming ever harder as many new features are added.
>
> I propose that the best way to deal with this would be to make a
> branch-1.  We could continue to make new feature releases off of this
> branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
> This provides stability and continuity for users and developers.
>
> We could then merge these new features branches (LLAP, HBase metastore,
> CLI drop) into the trunk, as well as turn on by default newer features such
> as the vectorization and ACID.  We could also drop older, less used
> features such as support for Hadoop-1 and MapReduce.  It will be a while
> before we are ready to make stable, production ready releases of this
> code.  But we could start making alpha quality releases soon.  We would
> call these releases 2.x, to stress the non-backward compatible changes such
> as dropping Hadoop-1.  This will give users a chance to play with the new
> code and developers a chance to get feedback.
>
> Thoughts?
>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to