Re: Looking to a Hadoop 3 release

Andrew Wang Fri, 06 Mar 2015 10:59:05 -0800

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.


Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer <a...@altiscale.com> wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur <tuc...@gmail.com> wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <a...@altiscale.com>
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org>
> wrote:
> >>>
> >>>> I think it'll be useful to have a discussion about what else people
> >> would
> >>>> like to see in Hadoop 3.x - especially if the change is potentially
> >>>> incompatible. Also, what we expect the release schedule to be for
> major
> >>>> releases and what triggers them - JVM version, major features, the
> need
> >> for
> >>>> incompatible changes ? Assuming major versions will not be released
> >> every 6
> >>>> months/1 year (adoption time, fairly disruptive for downstream
> projects,
> >>>> and users) -  considering additional features/incompatible changes for
> >> 3.x
> >>>> would be useful.
> >>>>
> >>>> Some features that come to mind immediately would be
> >>>> 1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
> >>>> two way communication. There's a lot of places where we re-use
> >> heartbeats
> >>>> to send more information than what would be done if the PRC layer
> >> supported
> >>>> these features. Some of this can be done in a compatible manner to the
> >>>> existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
> >>>> After this, having HDFS/YARN actually make use of these changes. The
> >> other
> >>>> consideration is adoption of an alternate system ike gRpc which would
> be
> >>>> incompatible.
> >>>> 2) Simplification of configs - potentially separating client side
> >> configs
> >>>> and those used by daemons. This is another source of perpetual
> confusion
> >>>> for users.
> >>>>
> >>>> Thanks
> >>>> - Sid
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> ste...@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>> Sorry, outlook dequoted Alejandros's comments.
> >>>>>
> >>>>> Let me try again with his comments in italic and proofreading of mine
> >>>>>
> >>>>> On 05/03/2015 13:59, "Steve Loughran" <ste...@hortonworks.com
> <mailto:
> >>>>> ste...@hortonworks.com>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tuc...@gmail.com<mailto:
> >>>>> tuc...@gmail.com><mailto:tuc...@gmail.com>> wrote:
> >>>>>
> >>>>> IMO, if part of the community wants to take on the responsibility and
> >>>> work
> >>>>> that takes to do a new major release, we should not discourage them
> >> from
> >>>>> doing that.
> >>>>>
> >>>>> Having multiple major branches active is a standard practice.
> >>>>>
> >>>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> >>>>> long time to get out, and during that time 0.21, 0.22, got released
> and
> >>>>> ignored; 0.23 picked up and used in production.
> >>>>>
> >>>>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>>>> widely enough to be used in products, and changes were made between
> >> that
> >>>>> alpha & 2.2 itself which raised compatibility issues.
> >>>>>
> >>>>> For 3.x I'd propose
> >>>>>
> >>>>>
> >>>>> 1.  Have less longevity of 3.x alpha/beta artifacts
> >>>>> 2.  Make clear there are no guarantees of compatibility from
> >> alpha/beta
> >>>>> releases to shipping. Best effort, but not to the extent that it gets
> >> in
> >>>>> the way. More succinctly: we will care more about seamless migration
> >> from
> >>>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>>> 3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >>>> alpha/beta
> >>>>> phase
> >>>>>
> >>>>> As well as backwards compatibility, we need to think about Forwards
> >>>>> compatibility, with the goal being:
> >>>>>
> >>>>> Any app written/shipped with the 3.x release binaries (JAR and
> native)
> >>>>> will work in and against a 3.y Hadoop cluster, for all x, y in
> Natural
> >>>>> where y>=x  and is-release(x) and is-release(y)
> >>>>>
> >>>>> That's important, as it means all server-side changes in 3.x which
> are
> >>>>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>>>> decoding, security features, must be considered complete and stable
> >>>> before
> >>>>> we can say is-release(x). In an ideal world, we'll even get the
> >> semantics
> >>>>> right with tests to show this.
> >>>>>
> >>>>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >>>> But:
> >>>>> it's only one of the features, and given there's not any design doc
> on
> >>>> that
> >>>>> JIRA, way too immature to set a release schedule on. An alpha
> schedule
> >>>> with
> >>>>> no-guarantees and a regular alpha roll, could be viable, as new
> >> features
> >>>> go
> >>>>> in and can then be used to experimentally try this stuff in branches
> of
> >>>>> Hbase (well volunteered, Stack!), etc. Of course instability
> guarantees
> >>>>> will be transitive downstream.
> >>>>>
> >>>>>
> >>>>> This time around we are not replacing the guts as we did from Hadoop
> 1
> >> to
> >>>>> Hadoop 2, but superficial surgery to address issues were not
> considered
> >>>> (or
> >>>>> was too much to take on top of the guts transplant).
> >>>>>
> >>>>> For the split brain concern, we did a great of job maintaining
> Hadoop 1
> >>>> and
> >>>>> Hadoop 2 until Hadoop 1 faded away.
> >>>>>
> >>>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>>>> compatibility.
> >>>>>
> >>>>>
> >>>>> Based on that experience I would say that the coexistence of Hadoop 2
> >> and
> >>>>> Hadoop 3 will be much less demanding/traumatic.
> >>>>>
> >>>>> The re-layout of all the source trees was a major change there,
> >> assuming
> >>>>> there's no refactoring or switch of build tools then picking things
> >> back
> >>>>> will be tractable
> >>>>>
> >>>>>
> >>>>> Also, to facilitate the coexistence we should limit Java language
> >>>> features
> >>>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >>>> anymore
> >>>>> we can remove this limitation.
> >>>>>
> >>>>> +1; setting javac.version will fix this
> >>>>>
> >>>>> What is nice about having java 8 as the base JVM is that it means you
> >> can
> >>>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> >> apps
> >>>>> and libs can use all Java 8 features they want to.
> >>>>>
> >>>>> There's one policy change to consider there which is possibly, just
> >>>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>>>> languages early, provided everyone recognised that "backport to
> >> branch-2"
> >>>>> isn't going to happen.
> >>>>>
> >>>>> -Steve
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Looking to a Hadoop 3 release

Reply via email to