Spark 2.x has to be the time for Java 8. I'd rather increase JVM major version on a Spark major version than on a Spark minor version, and I'd rather Spark do that upgrade for the 2.x series than the 3.x series (~2yr from now based on the lifetime of Spark 1.x). If we wait until the next opportunity for a breaking change to Spark (3.x) we might be upgrading to Java 9 at that point rather than Java 8.
If Spark users need Java 7 they are free to continue using the 1.x series, the same way that folks who need Java 6 are free to continue using 1.4 On Thu, Mar 24, 2016 at 11:46 AM, Stephen Boesch <java...@gmail.com> wrote: > +1 for java8 only +1 for 2.11+ only . At this point scala libraries > supporting only 2.10 are typically less active and/or poorly maintained. > That trend will only continue when considering the lifespan of spark 2.X. > > 2016-03-24 11:32 GMT-07:00 Steve Loughran <ste...@hortonworks.com>: > >> >> On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com> wrote: >> >> i think the arguments are convincing, but it also makes me wonder if i >> live in some kind of alternate universe... we deploy on customers clusters, >> where the OS, python version, java version and hadoop distro are not chosen >> by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply >> have access to a single proxy machine and launch through yarn. asking them >> to upgrade java is pretty much out of the question or a 6+ month ordeal. of >> the 10 client clusters i can think of on the top of my head all of them are >> on java 7, none are on java 8. so by doing this you would make spark 2 >> basically unusable for us (unless most of them have plans of upgrading in >> near term to java 8, i will ask around and report back...). >> >> >> >> It's not actually mandatory for the process executing in the Yarn cluster >> to run with the same JVM as the rest of the Hadoop stack; all that is >> needed is for the environment variables to set up the JAVA_HOME and PATH. >> Switching JVMs not something which YARN makes it easy to do, but it may be >> possible, especially if Spark itself provides some hooks, so you don't have >> to manually lay with setting things up. That may be something which could >> significantly ease adoption of Spark 2 in YARN clusters. Same for Python. >> >> This is something I could probably help others to address >> >> >