Maybe the question should be how far back should spark be compatible?
There is nothings stopping people to run spark 1.6.x with jdk 7 or scala 2.10 or Hadoop <2.6 But if they want spark 2.x they should consider a migration to jdk8 and scala 2.11 Or am I getting it all wrong? Raymond Honderdors Team Lead Analytics BI Business Intelligence Developer raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com> T +972.7325.3569 Herzliya From: Tom Graves [mailto:tgraves...@yahoo.com.INVALID] Sent: Wednesday, March 30, 2016 4:46 PM To: Steve Loughran <ste...@hortonworks.com> Cc: Reynold Xin <r...@databricks.com>; Koert Kuipers <ko...@tresata.com>; Kostas Sakellis <kos...@cloudera.com>; Marcelo Vanzin <van...@cloudera.com>; dev@spark.apache.org Subject: Re: [discuss] ending support for Java 7 in Spark 2.0 Steve, those are good points, I had forgotten Hadoop had those issues. We run with jdk 8, hadoop is built for jdk7 compatibility, we are running hadoop 2.7 on our clusters and by the time Spark 2.0 is out I would expected a mix of Hadoop 2.7 and 2.8. We also don't use spnego. I didn't quite follow what you were saying with the hadoop services being on jdk7. Are you saying building spark with say hadoop 2.8 libraries but your hadoop cluster is running hadoop 2.6 or less? If so I would agree that isn't a good idea. Personally and from Yahoo point I'm still fine with going to jdk8 but I could see where other people are on older versions of Hadoop where it might be a problem. Tom On Wednesday, March 30, 2016 5:42 AM, Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote: Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should be the minimum Hadoop version. https://issues.apache.org/jira/browse/HADOOP-11090 Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java 8 in its own JVM I'm not sure that you could get away with having the newer version of the Hadoop classes in the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop 2.x >= the spark bundled edition. Ignoring that detail, Hadoop 2.6.1+ Guava >= 15? 17? I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will be met with a "please upgrade, re-open if the problem is still there". Kerberos is a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786) Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable. Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you look at Spark 2.0? -Steve