Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Andrew Wang Mon, 09 Mar 2015 15:57:59 -0700

I find this proposal very surprising. We've intentionally deferred
incompatible changes to trunk, because they are incompatible and do not
belong in a minor release. Now we are supposed to blur our eyes and release
these changes anyway? I don't see this ending well.


One higher-level goal we should be working towards is tightening our
compatibility guarantees, not loosening them. This is why I've been
highlighting classpath isolation as a 3.0 feature, since this is one of the
biggest issues faced by our users and downstreams. I think a 3.0 with an
improved compatibility story will make operators and downstreams much
happier than releasing trunk as 2.8.

Best,
Andrew

On Mon, Mar 9, 2015 at 3:05 PM, Colin P. McCabe <cmcc...@apache.org> wrote:

> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
> to plan a new Hadoop release against a version of Java that is almost
> obsolete and (soon) no longer receiving security updates.  I think
> people will be willing to roll out a new version of Java for Hadoop
> 3.x.
>
> Similarly, the whole point of bumping the major version number is the
> ability to make incompatible changes.  There are already a bunch of
> incompatible changes in the trunk branch.  Are you proposing to revert
> those?  Or push them into newly created feature branches?  This
> doesn't seem like a good idea to me.
>
> I would be in favor of backporting targetted incompatible changes from
> trunk to branch-2.  For example, we could consider pulling in Allen's
> shell script rewrite.  But pulling in all of trunk seems like a bad
> idea at this point, if we want a 2.x release.
>
> best,
> Colin
>
> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <ste...@hortonworks.com>
> wrote:
> >
> > If 3.x is going to be Java 8 & not backwards compatible, I don't expect
> anyone wanting to use this in production until some time deep into 2016.
> >
> > Issue: JDK 8 vs 7
> >
> > It will require Hadoop clusters to move up to Java 8. While there's dev
> pull for this, there's ops pull against this: people are still in the
> moving-off Java 6 phase due to that "it's working, don't update it"
> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
> want it.
> >
> > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
> the main thing is setting up JAVA_HOME. That's something we could make
> easier somehow (maybe some min Java version field in resource requests that
> will let apps say java 8, java 9, ...). YARN could not only set up JVM
> paths, it could fail-fast if a Java version wasn't available.
> >
> > What we can't do in hadoop coretoday  is set javac.version=1.8 & use
> java 8 code. Downstream code ca do that (Hive, etc); they just need to
> accept that they don't get to play on JDK7 clusters if they embrace
> l-expressions.
> >
> > So...we need to stay on java 7 for some time due to ops pull; downstream
> apps get to choose what they want. We can/could enhance YARN to make JVM
> choice more declarative.
> >
> > Issue: Incompatible changes
> >
> > Without knowing what is proposed for "an incompatible classpath change",
> I can't say whether this is something that could be made optional. If it
> isn't, then it is a python-3 class option, "rewrite your code" event, which
> is going to be particularly traumatic to things like Hive that already do
> complex CP games. I'm currently against any mandatory change here, though
> would love to see an optional one. And if optional, it ceases to become an
> incompatible change...
> >
> > Issue: Getting trunk out the door
> >
> > The main diff from branch-2 and trunk is currently the bash script
> changes. These don't break client apps. May or may not break bigtop & other
> downstream hadoop stacks, but developers don't need to worry about this:
> no recompilation necessary
> >
> > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
> >
> > It seems to me that I could go
> >
> > git checkout trunk
> >         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
> >
> > We'd then have a version of Hadoop-trunk we could ship later this year,
> compatible at the JDK and API level with the existing java code & JDK7+
> clusters.
> >
> > A classpath fix that is optional/compatible can then go out on the 2.x
> line, saving the 3.x tag for something that really breaks things, forces
> all downstream apps to set up new hadoop profiles, have separate modules &
> generally hate the hadoop dev team
> >
> > This lets us tick off the "recent trunk release" and "fixed shell
> scripts" items, pushing out those benefits to people sooner rather than
> later, and puts off the "Hello, we've just broken your code" event for
> another 12+ months.
> >
> > Comments?
> >
> > -Steve
> >
> >
> >
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Reply via email to