The discussions in HADOOP-9151 were related to wire-compatibility. I think we 
all agree that breaking API compatibility is not allowed without deprecating 
them first in a prior major release - this is something we have followed since 
hadoop-0.1.

I agree we need to spell out what changes we can and cannot do *after* we go 
GA, for e.g.:
# Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
# Do we allow incompatible changes on Client-Server protocols? I would say *no*.
# Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN 
or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I 
would like to not allow this, but I do not know how feasible this is. An option 
is to allow these changes between minor releases i.e. between hadoop-2.10 and 
hadoop-2.11.
# Do we allow changes which force a HDFS metadata upgrade between a minor 
upgrade i.e. hadoop-2.20 to hadoop-2.21? 
# Clearly *no* incompatible changes (API/client-server/server-server) changes 
are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be 
compatible among all respects.

What else am I missing?

I'll make sure we update our Roadmap wiki and other docs post this discussion.

thanks,
Arun



On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

> Thanks for bringing this up Arun.  One of the issues is that we
> haven't been clear about what type of compatibility breakages are
> allowed, and which are not.  For example, renaming FileSystem#open is
> incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
> a server/server APIs is OK pre-GA but probably not post GA, at least
> in a point release, or required for a security fix, etc.
> Configuration, data format, environment variable, changes etc can all
> be similarly incompatible. The issue we had in HADOOP-9151 was someone
> claimed it is not an incompatible change because it doesn't break API
> compatibility even though it breaks wire compatibility. So let's be
> clear about the types of incompatibility we are or are not permitting.
> For example, will it be OK to merge a change before 2.2.0-beta that
> requires an HDFS metadata upgrade? Or breaks client server wire
> compatibility?  I've been assuming that changing an API annotated
> Public/Stable still requires multiple major releases (one to deprecate
> and one to remove), does the alpha label change that? To some people
> the "alpha", "beta" label implies instability in terms of
> quality/features, while to others it means unstable APIs (and to some
> both) so it would be good to spell that out. In short, agree that we
> really need to figure out what changes are permitted in what releases,
> and we should update the docs accordingly (there's a start here:
> http://wiki.apache.org/hadoop/Roadmap).
> 
> Note that the 2.0.0 alpha release vote thread was clear that we
> thought were all in agreement that we'd like to keep client/server
> compatible post 2.0 - and there was no push back. We pulled a number
> of jiras into the 2.0 release explicitly so that we could preserve
> client/server compatibility going forward.  Here's the relevant part
> of the thread as a refresher: http://s.apache.org/gQ
> 
> "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
> envelope in branch-2, but didn't make it into this rc. So, that would
> mean that future alphas would not be protocol-compatible with this
> alpha. Per a discussion a few weeks ago, I think we all were in
> agreement that, if possible, we'd like all 2.x to be compatible for
> client-server communication, at least (even if we don't support
> cross-version for the intra-cluster protocols)"
> 
> Thanks,
> Eli
> 
> On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>> Folks,
>> 
>> There has been some discussions about incompatible changes in the 
>> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few 
>> other jiras. Frankly, I'm surprised about some of them since the 'alpha' 
>> moniker was precisely to harden apis by changing them if necessary, borne 
>> out by the fact that every  single release in hadoop-2 chain has had 
>> incompatible changes. This happened since we were releasing early, moving 
>> fast and breaking things. Furthermore, we'll have more in future as move 
>> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS 
>> and YARN-142 (api changes) for YARN.
>> 
>> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd 
>> suggested calling the next release as hadoop-2.1.0-alpha to indicate the 
>> incompatibility a little better. This makes sense to me, as long as we are 
>> clear that we won't make any further *feature* releases in hadoop-2.0.x 
>> series (obviously we might be forced to do security/bug-fix release).
>> 
>> Going forward, I'd like to start locking down apis/protocols for a 'beta' 
>> release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha 
>> to make incompatible changes if necessary and we can call it 
>> hadoop-2.2.0-beta.
>> 
>> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible 
>> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This 
>> forces us to do a real effort on making sure we lock down for 
>> hadoop-2.2.0-beta.
>> 
>> In summary:
>> # I plan to now release hadoop-2.1.0-alpha (this week).
>> # We make a real effort to lock down apis/protocols and release 
>> hadoop-2.2.0-beta, say in March.
>> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>> 
>> I'll start a separate thread on 'locking protocols' w.r.t client-protocols 
>> v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss 
>> this one separately.
>> 
>> Makes sense? Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> PS:  Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make 
>> some incompatible changes due to *unforeseen circumstances*, but no more 
>> gratuitous changes are allowed.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


Reply via email to