Re: Looking to a Hadoop 3 release

Steve Loughran Thu, 05 Mar 2015 14:48:32 -0800

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine


On 05/03/2015 13:59, "Steve Loughran" 
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:



On 05/03/2015 13:05, "Alejandro Abdelnur" 
<tuc...@gmail.com<mailto:tuc...@gmail.com><mailto:tuc...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha & 2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work in and against a 3.y Hadoop cluster, for all x, y in Natural  where y>=x  
and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's 
only one of the features, and given there's not any design doc on that JIRA, 
way too immature to set a release schedule on. An alpha schedule with 
no-guarantees and a regular alpha roll, could be viable, as new features go in 
and can then be used to experimentally try this stuff in branches of Hbase 
(well volunteered, Stack!), etc. Of course instability guarantees will be 
transitive downstream.


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that "backport to branch-2" isn't going to happen.

-Steve

Re: Looking to a Hadoop 3 release

Reply via email to