I want to bring up the issue of Scala 2.10 support again, to see how people feel about it. Key opinions from the previous responses, I think:
Cody: only drop 2.10 support when 2.12 support is added Koert: we need all dependencies to support 2.12; Scala updates are pretty transparent to IT/ops Ofir: make sure to deprecate 2.10 in Spark 2.1 Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2 Matei: let’s not remove things unless they’re burdensome for the project; some people are still on old environments that their IT can’t easily update Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work to support 2.12, and there is progress, especially in dependencies supporting 2.12. It looks like 2.12 support may even entail a breaking change as documented in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping Kafka 0.8, for example. In any event it’s going to take some surgery and a few hacks to make one code base work across 2.11 and 2.12. I don’t see this happening for Spark 2.2.0 because there are just a few weeks left. Supporting three versions at once is probably infeasible, so dropping 2.10 should precede 2.12 support. Right now, I would like to make progress towards changes that 2.12 will require but that 2.11/2.10 can support. For example, we have to update scalatest, breeze, chill, etc. and can do that before 2.12 is enabled. However I’m finding making those changes tricky or maybe impossible in one case while 2.10 is still supported. For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10 support, and even get in additional prep work for 2.12, into the 2.2.0 release. The move to support 2.12 in 2.3.0 would then be a smaller change. It isn’t strictly necessary. We could delay all of that until after 2.2.0 and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy enough at this stage to drop for Spark 2.2.0? I don’t feel strongly about it but there are some reasonable arguments for dropping it: - 2.10 doesn’t technically support Java 8, though we do have it working still even after requiring Java 8 - Safe to say virtually all common _2.10 libraries has a _2.11 counterpart at this point? - 2.10.x was “EOL” in September 2015 with the final 2.10.6 release - For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x Before I open a JIRA, just soliciting opinions. On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote: > I'd like to gauge where people stand on the issue of dropping support for > a few things that were considered for 2.0. > > First: Scala 2.10. We've seen a number of build breakages this week > because the PR builder only tests 2.11. No big deal at this stage, but, it > did cause me to wonder whether it's time to plan to drop 2.10 support, > especially with 2.12 coming soon. > > Next, Java 7. It's reasonably old and out of public updates at this stage. > It's not that painful to keep supporting, to be honest. It would simplify > some bits of code, some scripts, some testing. > > Hadoop versions: I think the the general argument is that most anyone > would be using, at the least, 2.6, and it would simplify some code that has > to reflect to use not-even-that-new APIs. It would remove some moderate > complexity in the build. > > > "When" is a tricky question. Although it's a little aggressive for minor > releases, I think these will all happen before 3.x regardless. 2.1.0 is not > out of the question, though coming soon. What about ... 2.2.0? > > > Although I tend to favor dropping support, I'm mostly asking for current > opinions. >