The discussions in HADOOP-9151 were related to wire-compatibility. I think we all agree that breaking API compatibility is not allowed without deprecating them first in a prior major release - this is something we have followed since hadoop-0.1.
I agree we need to spell out what changes we can and cannot do *after* we go GA, for e.g.: # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA. # Do we allow incompatible changes on Client-Server protocols? I would say *no*. # Do we allow incompatible changes on internal-server protocols (for e.g. NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support rolling-upgrades? I would like to not allow this, but I do not know how feasible this is. An option is to allow these changes between minor releases i.e. between hadoop-2.10 and hadoop-2.11. # Do we allow changes which force a HDFS metadata upgrade between a minor upgrade i.e. hadoop-2.20 to hadoop-2.21? # Clearly *no* incompatible changes (API/client-server/server-server) changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1 have to be compatible among all respects. What else am I missing? I'll make sure we update our Roadmap wiki and other docs post this discussion. thanks, Arun On Jan 30, 2013, at 4:21 PM, Eli Collins wrote: > Thanks for bringing this up Arun. One of the issues is that we > haven't been clear about what type of compatibility breakages are > allowed, and which are not. For example, renaming FileSystem#open is > incompatible, and not OK, regardless of the alpha/beta tag. Breaking > a server/server APIs is OK pre-GA but probably not post GA, at least > in a point release, or required for a security fix, etc. > Configuration, data format, environment variable, changes etc can all > be similarly incompatible. The issue we had in HADOOP-9151 was someone > claimed it is not an incompatible change because it doesn't break API > compatibility even though it breaks wire compatibility. So let's be > clear about the types of incompatibility we are or are not permitting. > For example, will it be OK to merge a change before 2.2.0-beta that > requires an HDFS metadata upgrade? Or breaks client server wire > compatibility? I've been assuming that changing an API annotated > Public/Stable still requires multiple major releases (one to deprecate > and one to remove), does the alpha label change that? To some people > the "alpha", "beta" label implies instability in terms of > quality/features, while to others it means unstable APIs (and to some > both) so it would be good to spell that out. In short, agree that we > really need to figure out what changes are permitted in what releases, > and we should update the docs accordingly (there's a start here: > http://wiki.apache.org/hadoop/Roadmap). > > Note that the 2.0.0 alpha release vote thread was clear that we > thought were all in agreement that we'd like to keep client/server > compatible post 2.0 - and there was no push back. We pulled a number > of jiras into the 2.0 release explicitly so that we could preserve > client/server compatibility going forward. Here's the relevant part > of the thread as a refresher: http://s.apache.org/gQ > > "2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC > envelope in branch-2, but didn't make it into this rc. So, that would > mean that future alphas would not be protocol-compatible with this > alpha. Per a discussion a few weeks ago, I think we all were in > agreement that, if possible, we'd like all 2.x to be compatible for > client-server communication, at least (even if we don't support > cross-version for the intra-cluster protocols)" > > Thanks, > Eli > > On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy <a...@hortonworks.com> wrote: >> Folks, >> >> There has been some discussions about incompatible changes in the >> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and few >> other jiras. Frankly, I'm surprised about some of them since the 'alpha' >> moniker was precisely to harden apis by changing them if necessary, borne >> out by the fact that every single release in hadoop-2 chain has had >> incompatible changes. This happened since we were releasing early, moving >> fast and breaking things. Furthermore, we'll have more in future as move >> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS >> and YARN-142 (api changes) for YARN. >> >> So, rather than debate more, I had a brief chat with Suresh and Todd. Todd >> suggested calling the next release as hadoop-2.1.0-alpha to indicate the >> incompatibility a little better. This makes sense to me, as long as we are >> clear that we won't make any further *feature* releases in hadoop-2.0.x >> series (obviously we might be forced to do security/bug-fix release). >> >> Going forward, I'd like to start locking down apis/protocols for a 'beta' >> release. This way we'll have one *final* opportunity post hadoop-2.1.0-alpha >> to make incompatible changes if necessary and we can call it >> hadoop-2.2.0-beta. >> >> Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible >> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release. This >> forces us to do a real effort on making sure we lock down for >> hadoop-2.2.0-beta. >> >> In summary: >> # I plan to now release hadoop-2.1.0-alpha (this week). >> # We make a real effort to lock down apis/protocols and release >> hadoop-2.2.0-beta, say in March. >> # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May. >> >> I'll start a separate thread on 'locking protocols' w.r.t client-protocols >> v/s internal protocols (to facilitate rolling upgrades etc.), let's discuss >> this one separately. >> >> Makes sense? Thoughts? >> >> thanks, >> Arun >> >> PS: Between hadoop-2.2.0-beta and hadoop-2.3.0 we *might* be forced to make >> some incompatible changes due to *unforeseen circumstances*, but no more >> gratuitous changes are allowed. >> -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/