RE: [discuss] ending support for Java 7 in Spark 2.0

Raymond Honderdors Thu, 24 Mar 2016 00:42:47 -0700

Very good points

Going to support java 8 looks like a good direction
2.0 would be a good release to start with that


Raymond Honderdors
Team Lead Analytics BI
Business Intelligence Developer
raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com>
T +972.7325.3569
Herzliya

From: Reynold Xin [mailto:r...@databricks.com]
Sent: Thursday, March 24, 2016 9:37 AM
To: dev@spark.apache.org
Subject: Re: [discuss] ending support for Java 7 in Spark 2.0

One other benefit that I didn't mention is that we'd be able to use Java 8's 
Optional class to replace our built-in Optional.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering 
if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would 
require Java 8 to run).

Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed 
public downloads for JDK 7 in July 2015. In the past I've actually been against 
dropping Java 8, but today I ran into an issue with the new Dataset API not 
working well with Java 8 lambdas, and that changed my opinion on this.

I've been thinking more about this issue today and also talked with a lot 
people offline to gather feedback, and I actually think the pros outweighs the 
cons, for the following reasons (in some rough order of importance):

1. It is complicated to test how well Spark APIs work for Java lambdas if we 
support Java 7. Jenkins machines need to have both Java 7 and Java 8 installed 
and we must run through a set of test suites in 7, and then the lambda tests in 
Java 8. This complicates build environments/scripts, and makes them less 
robust. Without good testing infrastructure, I have no confidence in building 
good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. 
The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame, and 
this impacts pretty much everything from machine learning to structured 
streaming. We have made great progress in their performance through extensive 
use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets 
looks more like a compiler than a MapReduce or query engine.) These 
optimizations don't work well in Java 7 due to broken code cache flushing. This 
problem has been fixed by Oracle in Java 8. In addition, Java 8 comes with 
better support for Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add support for that. 
Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly 
complicated compatibility matrix and testing infrastructure.

4. There are libraries that I've looked into in the past that support only Java 
8. This is more common in high performance libraries such as Aeron (a messaging 
library). Having to support Java 7 means we are not able to use these. It is 
not that big of a deal right now, but will become increasingly more difficult 
as we optimize performance.


The downside of not supporting Java 7 is also obvious. Some organizations are 
stuck with Java 7, and they wouldn't be able to use Spark 2.0 without upgrading 
Java.

RE: [discuss] ending support for Java 7 in Spark 2.0

Reply via email to