Re: Looking to a Hadoop 3 release

Siddharth Seth Thu, 05 Mar 2015 15:21:35 -0800

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.


Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <ste...@hortonworks.com>
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran" <ste...@hortonworks.com<mailto:
> ste...@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur" <tuc...@gmail.com<mailto:
> tuc...@gmail.com><mailto:tuc...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> and libs can use all Java 8 features they want to.
>
> There's one policy change to consider there which is possibly, just
> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> languages early, provided everyone recognised that "backport to branch-2"
> isn't going to happen.
>
> -Steve
>
>

Re: Looking to a Hadoop 3 release

Reply via email to