+1 for incrementing the major number for library compatibility changes and decoupling the release versions across languages.
I agree with the points that Sean made. This is confusing for users, and needlessly so because there is only one binary format and I don't think there are major pushes to introduce a breaking change. On Mon, Mar 2, 2020 at 7:48 AM Sean Busbey <bus...@apache.org> wrote: > > So, as I understand it. The whole 1.x version should be binary > compatible. > > the term "binary compatible" is overloaded, thanks to our > participation in the Java ecosystem. Every data file written in Avro > 1.y is intended to be readable by every other Avro 1.y release, > regardless of language. That is true even if we know that there are > cases where errors in various language libraries has prevented > success. > > In Java ecosystem parlance the term "binary compatible" also refers to > the ability to use a Java library in place of another without needing > to recompile any code that refers to said library. It is definitely > not the case that every Avro 1.y Java version has been binary > compatible in this sense (in fact just the opposite). > > > For 1.9 we broke some of the API's because, at the time, the decision was > > that removing Jackson from the public API was required to move from > > Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API > shouldn't > > have exposed these methods in the first place. > > I think this is confusing the two issues of "compatibility of > serialized bytes" and "compatibility of language APIs". This is part > of why I think we should stop relying on the first version number to > indicate "compatibility of serialized bytes". > > > I wouldn't be in favor of switching to 10.x (dropping the 1. in front of > > it). What's the added value in this? I'm just afraid of changing this, > > would confuse a lot of downstream users. > > The big advantage is that literally everywhere else in the software > ecosystems I've seen the first number in a version string is either > "major version" or "marketing version", usually the former. Folks > expect that if that version number hasn't changed then they should be > able to "easily" upgrade to use the newer library. In Avro that > plainly isn't true. I can think of multiple cases where other ASF > projects have gotten surprised that going from e.g. Java libraries for > Avro 1.7 to Avro 1.8 was a major version bump. > > I agree that going from 1.x.y to 10.y.z might be confusing due to the > large number jump. I think going to "2.y.z" would clearly indicate > folks needed to pay attention to a version difference because the > first number changed. When we have their attention we can explain that > we're using it as major version from now on. > > > > Also, a similar discussion was on the Spark devlist, I think Michael has > > some valid points here: > > https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser > > This is a month of email from dev@spark. Could you link to a specific > thread on lists.apache.org or provide a subject line? > > On Mon, Mar 2, 2020 at 3:00 AM Driesprong, Fokko <fo...@driesprong.frl> > wrote: > > > > So, as I understand it. The whole 1.x version should be binary > compatible. > > So anything is written with Java 1.x should be readable with Python 1.x. > > We've been working on extending the integration tests as well. > > > > Not all languages support all the features, for example, many languages > > lack support for logical types. In this case, a datetime would be just > read > > as an integer, so there is a fallback scenario. > > > > For 1.9 we broke some of the API's because, at the time, the decision was > > that removing Jackson from the public API was required to move from > > Codehaus jackson (1.x) to the Fasterxml one (2.x). The public API > shouldn't > > have exposed these methods in the first place. > > > > I wouldn't be in favor of switching to 10.x (dropping the 1. in front of > > it). What's the added value in this? I'm just afraid of changing this, > > would confuse a lot of downstream users. > > > > Also, a similar discussion was on the Spark devlist, I think Michael has > > some valid points here: > > https://mail-archives.apache.org/mod_mbox/spark-dev/202002.mbox/browser > > > > Maybe it is good to formalize our policy, and put it on the website. > > > > Cheers, Fokko Driesprong > > > > Op vr 28 feb. 2020 om 17:53 schreef Sean Busbey <bus...@apache.org>: > > > > > Counterpoint on independently versioning the various languages. Do we > > > know if Python Avro X works with Java Avro Y as it is? It seems like > > > we already get surprised pretty often when they don't. > > > > > > If we stop including the "data compatibility version" or whatever > > > we're calling the first number, we'll need to get more formal on > > > versioning the specification and having libraries plainly label which > > > specification(s) they comport to. > > > > > > At the very least it seems like we'd make the _easy_ path easier for > > > the languages that are well maintained. Sure it'll be burden on those > > > languages that aren't well maintained, but it seems like those are > > > already in that position? > > > > > > On Thu, Feb 27, 2020 at 9:13 AM Ismaël Mejía <ieme...@gmail.com> > wrote: > > > > > > > > Bringing my comment from the JIRA ticket here for discussion: > > > > > > > > > "One argument against semantic versioning is the fact that Avro > > > supports > > > > 9 language APIs, so if let's say C++ breaks its backwards > compatibility > > > > should we move the version number up for every single language? > Sounds > > > like > > > > a burden and in particular a not easy task to track since we do not > have > > > > proper validation of breaking changes in place for every language at > this > > > > point. > > > > > ... (even if we separate release numbers per language) that seems > like > > > a > > > > lot of work for probably a similar output because then users will > doubt, > > > > wait is Python Avro 3.1.0 compatible with Java Avro 5.2.0? and they > will > > > > probably be for the binary format." > > > > > > > > Also there is the case of interop tests, how will those act in this > case. > > > > We will need a compatibility matrix, again I am not sure if it is the > > > best > > > > approach, looks like lots of work for not much in return. > > > > > > > > > > > > > > > > On Thu, Feb 27, 2020 at 12:21 PM Ryan Skraba <r...@skraba.com> > wrote: > > > > > > > > > Hello! Resurrecting -- I think this was the last thread bringing > up > > > this > > > > > issue! > > > > > > > > > > Since we've talked about releasing 1.10.x in May, and it's a nice > > > > > round number... what do you think about > > > > > > > > > > 1) finally dropping the prefix for the "specification version" and > > > > > calling it Avro 10.x > > > > > > > > > > 2) committing to semantic versioning for future releases > > > > > > > > > > I can see this being a hugely positive move for aligning with the > > > > > expectations of developers and projects... but it leads to a lot of > > > > > questions about releasing all the artifacts together. > > > > > > > > > > There's already a JIRA: > > > https://issues.apache.org/jira/browse/AVRO-2687 > > > > > > > > > > Ryan > > > > > > > > > > On Fri, Sep 13, 2019 at 12:00 PM Driesprong, Fokko > > > <fo...@driesprong.frl> > > > > > wrote: > > > > > > > > > > > > Thanks Sean for bringing this up. > > > > > > > > > > > > For the 1.9 branch there were some incompatible changes in the > API > > > with > > > > > > respect to 1.8.2. We've removed Jackson > > > > > > <https://github.com/apache/avro/pull/135> and Netty from the > public > > > API. > > > > > > This is actually breaking some of the builds > > > > > > <https://github.com/apache/incubator-iceberg/pull/297>, so, > > > > > unfortunately, > > > > > > it isn't compatible, and therefore the major version bump. > > > > > > > > > > > > The 1.9.x branch still has support for the Joda time library, but > > > > > defaults > > > > > > to jsr310, but is still compatible (I believe). For 1.10 the > plan is > > > to > > > > > > completely remove Joda from the codebase since it is officially > > > > > deprecated > > > > > > in favor of Java8 time (jsr310). A lot of this stuff is just > changes > > > to > > > > > the > > > > > > Java API of Avro, which mostly involves changes to the > LogicalTypes, > > > so > > > > > the > > > > > > actual format is still compatible (as it should). > > > > > > > > > > > > I agree with you Sean, that a lot of the changes that are > targeted > > > for > > > > > 1.10 > > > > > > could be cherry-picked back to the 1.9 branch. If someone is > willing > > > to > > > > > do > > > > > > this, I would be grateful. However, maintaining a lot of > different > > > > > branches > > > > > > is quite time-consuming in terms of release management of the > > > different > > > > > > versions. For Apache Avro 1.9.0 we actually had some regression > bugs > > > > > which > > > > > > were blocking, therefore the 1.9.1 release. > > > > > > > > > > > > Personally I don't have big objection on bumping the major > version if > > > > > there > > > > > > are breaking changes to one of the API's. But a big +1 on having > a > > > > > > standardized approach on the versioning, this also includes a > more > > > clear > > > > > > approach on documenting the upgrade process and a better > changelog. > > > I've > > > > > > added summaries of the releases a the Github releases: > > > > > > https://github.com/apache/avro/releases but I think having this > on > > > the > > > > > Avro > > > > > > website might be more appropriate. > > > > > > > > > > > > Cheers, Fokko Driesprong > > > > > > > > > > > > > > > > > > > > > > > > Op wo 11 sep. 2019 om 18:17 schreef Ryan Blue > > > <rb...@netflix.com.invalid > > > > > >: > > > > > > > > > > > > > > What would it look like if we *did* have to make an > incompatible > > > data > > > > > > > format change after adopting "conventional" library version > > > strings? > > > > > > > > > > > > > > Let's call these format v1 and v2. The library must produce v1 > by > > > > > default, > > > > > > > so it's a matter of having support for writing v2. When the > default > > > > > > > changes to v2, then that behavior change would require a major > > > version > > > > > > > increase to signal changes to compatibility. I think we would > also > > > want > > > > > > > clear documentation for each version that shows what versions > of > > > the > > > > > format > > > > > > > it can read, write, and what it will use by default. A table > on the > > > > > site > > > > > > > would work. > > > > > > > > > > > > > > On Tue, Sep 10, 2019 at 2:51 PM Sean Busbey <bus...@apache.org > > > > > wrote: > > > > > > > > > > > > > > > What would it look like if we *did* have to make an > incompatible > > > data > > > > > > > > format change after adopting "conventional" library version > > > strings? > > > > > > > > > > > > > > > > What if we version the specification independent from the > > > libraries > > > > > > > > and then have the docs for the libraries claim spec version > > > > > > > > compatibility? > > > > > > > > > > > > > > > > On Tue, Sep 10, 2019 at 3:55 PM Ryan Blue > > > <rb...@netflix.com.invalid > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > +1 for changing the version strings to follow a more > standard > > > > > > > convention. > > > > > > > > > We don't have any breaking format changes, so I think it is > > > > > expected > > > > > > > that > > > > > > > > > the format compatibility version won't change. > > > > > > > > > > > > > > > > > > On Tue, Sep 10, 2019 at 7:28 AM Sean Busbey < > bus...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi folks! > > > > > > > > > > > > > > > > > > > > historically, Avro version numbers have had the form: > > > > > > > > > > > > > > > > > > > > <data compatibility> . <major library version> . <minor > > > library > > > > > > > > version> > > > > > > > > > > > > > > > > > > > > That is, the first number says wether or not we expect > data > > > > > > > > > > serialization to be compatible, and the second to say > wether > > > we > > > > > > > expect > > > > > > > > > > some library will be backwards incompatible however > that's > > > > > defined > > > > > > > for > > > > > > > > > > the library's language. For example, in the Java library > > > when we > > > > > make > > > > > > > > > > changes to public method signatures such that folks can't > > > just > > > > > swap > > > > > > > > > > out jar files of our implementation. > > > > > > > > > > > > > > > > > > > > While getting myself up to speed on the state of our > release > > > > > lines, I > > > > > > > > > > noticed we already have the 1.9 release line in a branch, > > > with > > > > > master > > > > > > > > > > set up for the next major library version. JIRA shows ~46 > > > issues > > > > > that > > > > > > > > > > are in 1.10 but not in a 1.9 release[1]. > > > > > > > > > > > > > > > > > > > > I haven't looked at all of them yet, but the few I > sampled > > > don't > > > > > see > > > > > > > > > > to require a major version increment. > > > > > > > > > > > > > > > > > > > > I looked around our site and I also can't find anywhere > that > > > > > we've > > > > > > > > > > documented our version strings. I know I've been in > > > discussions > > > > > in > > > > > > > > > > other communities where our version strings have been > > > surprising. > > > > > > > e.g. > > > > > > > > > > folks had assumed they can do a low-effort upgrade from > 1.7 > > > to > > > > > 1.8 > > > > > > > > > > only to find that there were documented > incompatibilities and > > > > > > > behavior > > > > > > > > > > changes. > > > > > > > > > > > > > > > > > > > > Are we actively planning on rolling out 1.10? (like, do > we > > > have a > > > > > > > goal > > > > > > > > > > date?) > > > > > > > > > > > > > > > > > > > > I know that when 1.9 went out we EOLed 1.7 and 1.8 in > part > > > due > > > > > to the > > > > > > > > > > overhead of trying to maintain multiple release lines > > > (especially > > > > > > > once > > > > > > > > > > that had so much baggage) while we're trying to > reestablish > > > good > > > > > > > > > > habits on release cadence. How many major version are we > > > > > planning to > > > > > > > > > > keep going once 1.10 is ready? > > > > > > > > > > > > > > > > > > > > What do folks think about starting a CONTRIBUTING.md with > > > some of > > > > > > > > > > these expectations? Is there a better place to track it? > > > > > > > > > > > > > > > > > > > > [1] : https://s.apache.org/71yqv > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Ryan Blue > > > > > > > > > Software Engineer > > > > > > > > > Netflix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Ryan Blue > > > > > > > Software Engineer > > > > > > > Netflix > > > > > > > > > > > > > > > > -- Ryan Blue Software Engineer Netflix