*tl;dr*

I would like to propose renaming “minor release” to “feature release” in
Apache Spark.


*details*

Apache Spark’s official versioning policy follows roughly semantic
versioning. Each Spark release is versioned as
[major].[minor].[maintenance]. That is to say, 1.0.0 and 2.0.0 are both
“major releases”, whereas “1.1.0” and “1.3.0” would be minor releases.

I have gotten a lot of feedback from users that the word “minor” is
confusing and does not accurately describes those releases. When users hear
the word “minor”, they think it is a small update that introduces couple
minor features and some bug fixes. But if you look at the history of Spark
1.x, here are just a subset of large features added:

Spark 1.1: sort-based shuffle, JDBC/ODBC server, new stats library, 2-5X
perf improvement for machine learning.

Spark 1.2: HA for streaming, new network module, Python API for streaming,
ML pipelines, data source API.

Spark 1.3: DataFrame API, Spark SQL graduate out of alpha, tons of new
algorithms in machine learning.

Spark 1.4: SparkR, Python 3 support, DAG viz, robust joins in SQL, math
functions, window functions, SQL analytic functions, Python API for
pipelines.

Spark 1.5: code generation, Project Tungsten

Spark 1.6: automatic memory management, Dataset API, ML pipeline persistence


So while “minor” is an accurate depiction of the releases from an API
compatibiility point of view, we are miscommunicating and doing Spark a
disservice by calling these releases “minor”. I would actually call these
releases “major”, but then it would be a larger deviation from semantic
versioning. I think calling these “feature releases” would be a smaller
change and a more accurate depiction of what they are.

That said, I’m not attached to the name “feature” and am open to
suggestions, as long as they don’t convey the notion of “minor”.

Reply via email to