I think it'll be useful to have a discussion about what else people would like to see in Hadoop 3.x - especially if the change is potentially incompatible. Also, what we expect the release schedule to be for major releases and what triggers them - JVM version, major features, the need for incompatible changes ? Assuming major versions will not be released every 6 months/1 year (adoption time, fairly disruptive for downstream projects, and users) - considering additional features/incompatible changes for 3.x would be useful.
Some features that come to mind immediately would be 1) enhancements to the RPC mechanics - specifically support for AsynRPC / two way communication. There's a lot of places where we re-use heartbeats to send more information than what would be done if the PRC layer supported these features. Some of this can be done in a compatible manner to the existing RPC sub-system. Others like 2 way communication probably cannot. After this, having HDFS/YARN actually make use of these changes. The other consideration is adoption of an alternate system ike gRpc which would be incompatible. 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. Thanks - Sid On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <ste...@hortonworks.com> wrote: > Sorry, outlook dequoted Alejandros's comments. > > Let me try again with his comments in italic and proofreading of mine > > On 05/03/2015 13:59, "Steve Loughran" <ste...@hortonworks.com<mailto: > ste...@hortonworks.com>> wrote: > > > > On 05/03/2015 13:05, "Alejandro Abdelnur" <tuc...@gmail.com<mailto: > tuc...@gmail.com><mailto:tuc...@gmail.com>> wrote: > > IMO, if part of the community wants to take on the responsibility and work > that takes to do a new major release, we should not discourage them from > doing that. > > Having multiple major branches active is a standard practice. > > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a > long time to get out, and during that time 0.21, 0.22, got released and > ignored; 0.23 picked up and used in production. > > The 2.04-alpha release was more of a troublespot as it got picked up > widely enough to be used in products, and changes were made between that > alpha & 2.2 itself which raised compatibility issues. > > For 3.x I'd propose > > > 1. Have less longevity of 3.x alpha/beta artifacts > 2. Make clear there are no guarantees of compatibility from alpha/beta > releases to shipping. Best effort, but not to the extent that it gets in > the way. More succinctly: we will care more about seamless migration from > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. > 3. Anybody who ships code based on 3.x alpha/beta to recognise and > accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta > phase > > As well as backwards compatibility, we need to think about Forwards > compatibility, with the goal being: > > Any app written/shipped with the 3.x release binaries (JAR and native) > will work in and against a 3.y Hadoop cluster, for all x, y in Natural > where y>=x and is-release(x) and is-release(y) > > That's important, as it means all server-side changes in 3.x which are > expected to to mandate client-side updates: protocols, HDFS erasure > decoding, security features, must be considered complete and stable before > we can say is-release(x). In an ideal world, we'll even get the semantics > right with tests to show this. > > Fixing classpath hell downstream is certainly one feature I am +1 on. But: > it's only one of the features, and given there's not any design doc on that > JIRA, way too immature to set a release schedule on. An alpha schedule with > no-guarantees and a regular alpha roll, could be viable, as new features go > in and can then be used to experimentally try this stuff in branches of > Hbase (well volunteered, Stack!), etc. Of course instability guarantees > will be transitive downstream. > > > This time around we are not replacing the guts as we did from Hadoop 1 to > Hadoop 2, but superficial surgery to address issues were not considered (or > was too much to take on top of the guts transplant). > > For the split brain concern, we did a great of job maintaining Hadoop 1 and > Hadoop 2 until Hadoop 1 faded away. > > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS > compatibility. > > > Based on that experience I would say that the coexistence of Hadoop 2 and > Hadoop 3 will be much less demanding/traumatic. > > The re-layout of all the source trees was a major change there, assuming > there's no refactoring or switch of build tools then picking things back > will be tractable > > > Also, to facilitate the coexistence we should limit Java language features > to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore > we can remove this limitation. > > +1; setting javac.version will fix this > > What is nice about having java 8 as the base JVM is that it means you can > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps > and libs can use all Java 8 features they want to. > > There's one policy change to consider there which is possibly, just > possibly, we could allow new modules in hadoop-tools to adopt Java 8 > languages early, provided everyone recognised that "backport to branch-2" > isn't going to happen. > > -Steve > >